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Abstract 

In this paper we improve upon the running time for finding a point in a convex set given 
a separation oracle. In particular, given a separation oracle for a convex set A C R" that is 
contained in a box of radius R we show how to either compute a point in K or prove that K 
does not contain a ball of radius e using an expected 0{n\og{nR/e)) evaluations of the oracle 
and additional time 0{n^ \o^^^\nR/e)). This matches the oracle complexity and improves 
upon the log(ni?/e)) additional time of the previous fastest algorithm achieved over 25 

years ago by Vaidya [103] for the current value of the matrix multiplication constant uj < 2.373 
[110, 41] when R/e = 0(poly(n)). 

Using a mix of standard reductions and new techniques we show how our algorithm can be 
used to improve the running time for solving classic problems in continuous and combinatorial 
optimization. In particular we provide the following running time improvements: 

• Submodular Function Minimization: let n be the size of the ground set, M be the 
maximum absolute value of function values, and EO be the time for function evaluation. 

Our weakly and strongly polynomial time algorithms have a running time of 0(jn? log nM ■ 

EO + V? log^*-^^ nM) and 0{n^ log^ n ■ EO + log*^^^^ n), improving upon the previous 

best of 0((n‘* • EO + n^) logM) and 0(n^ ■ EO + n®) respectively. 

• Matroid Intersection: let n be the size of the ground set, r be the maximum size of 
independent sets, M be the maximum absolute value of element weight, and TIank and 
Tind be the time for each rank and independence oracle query. 

We obtain a running time of 0(nr7Iank log n log(nM)+n® log*^^^^ nM) and 0(n^7ind log(nM) + 
n^ nM), achieving the first quadratic bound on the query complexity for the in¬ 

dependence and rank oracles. In the unweighted case, this is the first improvement since 
1986 for independence oracle. 

• Submodular Flow: let n and m be the number of vertices and edges, C be the maximum 
edge cost in absolute value, and U be the maximum edge capacity in absolute value. 

We obtain a faster weakly polynomial running time of 0(n? log nCU -EO-l-n® log*^*-^^ nCU), 
improving upon the previous best of 0{mn^\ognU ■ EO) and O (n^/iminllogC,logC/}) 
from 15 years ago by a factor of 0{n‘^). We also achieve faster strongly polynomial time 
algorithms as a consequence of our result on submodular minimization. 

• Semidefinite Programming: let n be the number of constraints, m be the number of 
dimensions and S be the total number of non-zeros in the constraint matrix. 

We obtain a running time of 0{n(n^ + mR + S)), improving upon the previous best of 
0{n{n'^ + mR + S)) for the regime S is small. 
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Part 

Overview 


1 Introduction 

The ellipsoid method and more generally, cutting plane methods,^ that is optimization algorithms 
which iteratively call a separation oracle, have long been central to theoretical computer science. In 
combinatorial optimization, since Khachiyan’s seminal result in 1980 [65] proving that the ellipsoid 
method solves linear programs in polynomial time, the ellipsoid method has been crucial to solving 
discrete problems in polynomial time [49]. In continuous optimization, cutting plane methods have 
long played a critical role in convex optimization, where they are fundamental to the theory of 
non-smooth optimization [45]. 

Despite the key role that cutting plane methods have played historically in both combinatorial 
and convex optimization, over the past two decades progress on improving both the theoretical 
running time of cutting plane methods as well as the complexity of using cutting plane methods 
for combinatorial optimization has stagnated.^ The theoretical running time of cutting plane 
methods for convex optimization has not been improved since the breakthrough result by Vaidya 
in 1989 [103, 105]. Moreover, for many of the key combinatorial applications of ellipsoid method, 
such as submodular minimization, matroid intersection and submodular flow, the running time 
improvements over the past two decades have been primarily combinatorial; that is they have been 
achieved by discrete algorithms that do not use numerical machinery such as cutting plane methods. 

In this paper we make progress on these classic optimization problems on two fronts. First we 
show how to improve on the running time of cutting plane methods for a broad range of parameters 
that arise frequently in both combinatorial applications and convex programming (Part I). Second, 
we provide several frameworks for applying the cutting plane method and illustrate the efficacy of 
these frameworks by obtaining faster running times for semidefinite programming, matroid intersec¬ 
tion, and submodular flow (Part II). Finally, we show how to couple our approach with the problem 
specific structure and obtain faster weakly and strongly polynomial running times for submodular 
function minimization, a problem of tremendous importance in combinatorial optimization (Part 
III). In both cases our algorithms are faster than previous best by a factor of roughly D(n^). 

We remark that many of our running time improvements come both from our faster cutting 
method and from new careful analysis of how to apply these cutting plane methods. In fact, simply 
using our reductions to cutting plane methods and a seminal result of Vaidya [103, 105] on cutting 
plane methods we provide running times for solving many of these problems that improves upon 
the previous best stated. As such, we organized our presentation to hopefully make it easy to apply 
cutting plane methods to optimization problems and obtain provable guarantees in the future. 

Our results demonstrate the power of cutting plane methods in theory and possibly pave the 
way for new cutting plane methods in practice. We show how cutting plane methods can continue to 
improve running times for classic optimization problems and we hope that these methods may find 
further use. As cutting plane methods such as analytic cutting plane method [43, 10, 44, 87, 111, 45] 
are frequently used in practice [48, 42], these techniques may have further implications. 

^Throughout this paper our focus is on algorithms for polynomial time solvable convex optimization problems 
given access to a linear separation oracle. Our usage of the term cutting plane methods, should not be confused with 
work on integer programming, an NP-hard problem. 

^There are exceptions to this trend. For example, [70] showed how to apply cutting plane methods to yield running 
time improvements for semidefinite programming, and recently [15] showed how to use cutting plane methods to obtain 
an optimal result for smooth optimization problems. 
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1.1 Paper Organization 

After providing an overview of our results (Section 2) and preliminary information and notation 
used throughout the paper (Section 3), we split the remainder of the paper into three parts: 

• In Part I we provide our new cutting plane method. 

• In Part II we provide several general frameworks for using this cutting plane method and 
illustrate these frameworks with applications in combinatorial and convex optimization. 

• In Part III we then consider the more specific problem of submodular function minimization 
and show how our methods can be used to improve the running time for both strongly and 
weakly polynomial time algorithms. 

We aim to make each part relatively self contained. While each part builds upon the previous and 
the problems considered in each part are increasingly specific, we present the key results in each 
section in a modular way so that they may be read in any order. The dependencies between the 
different parts of our paper are characterized by the following: 

• Part I presents our faster cutting plane method as Theorem 31. 

• Part II depends only on Theorem 31 of Part I and presents a general running time guarantee 
for convex optimization problems as Theorem 42. 

• The faster weakly polynomial time algorithm in Part III depends only on Theorem 42, Part II. 

• The faster strongly polynomial time algorithm in Part III depends only on Theorem 31, Part I. 


2 Overview of Our Results 

Here we briefly summarize the contributions of our paper. For each of Part I, Part II, and Part III 
we describe the key technical contributions and present the running time improvements achieved. 

2.1 Cutting Plane Methods 

The central problem we consider in Part I is as follows. We are promised that a set K is contained 
a box of radius R and a separation oracle that given a point x in time SO either outputs that x 
is in K or outputs a separating hyperplane. We wish to either find a point in K or prove that K 
does not contain an ball of radius e. The running times for this problem are given in Table 1. 


Year 

Algorithm 

Complexity 

1979 

Ellipsoid Method [97, 112, 65] 

0(re^SO log K + n^ log k) 

1988 

Inscribed Ellipsoid [66, 88] 

0(nSO log K + {n log 

1989 

Volumetric Center [103] 

0(nSO log K + log k) 

1995 

Analytic Center [10] 

0{nSO log^ K + log^ k + (n log 

2004 

Random Walk [13] 

—)• 0(nSO log K + n^ log k) 

2013 

This paper 

0{nSO log K + n^ log^^^^ k) 


Table 1: Algorithms for the Feasibility Problem, k indicates nR/e. The arrow, —>■, indicates that 
it solves a more general problem where only a membership oracle is given. 


5 












In Part I we show how to solve this problem in 0{nSOlog{nR/e) + log‘^^^^(n-R/e)) time. 

This is an improvement over the previous best running time of 0(nS0 log(ni?/e) + log(ni2/e)) 
for the current best known bound of a; < 2.37 [41] assuming that R/e = 0(poly(n)), a common 
assumption for many problems in combinatorial optimization and numerical analysis as we find in 
Part II and Part III. (See Table 1 for a summary of previous running times.) 

Our key idea for achieving this running time improvement is a new straightforward technique 
for providing low variance unbiased estimates for changes in leverage scores that we hope will be of 
independent interest (See Section 7.1). We show how to use this technique along with ideas from 
[10, 104, 76] to decrease the log{D/e)) overhead in the previous fastest algorithm [103]. 

2.2 Convex Optimization 

In Part II we provide two techniques for applying our cutting plane method (and cutting plane 
methods in general) to optimization problems and provide several applications of these techniques. 

The hrst technique concerns reducing the number of dimensions through duality. For many 
problems, their dual is signihcantly simpler than itself (primal). We use semidefinite programming 
as a concrete example to show how to improve upon the running time for finding both primal and 
dual solution by using the cutting planes maintained by our cutting plane method. (See Table 2.) 

The second technique concerns how to minimize a linear function over the intersection of convex 
sets using optimization oracle. We analyze a simple potential function which allows us to bypass 
the typical reduction between separation and optimization to achieve faster running times. This 
reduction provides an improvement over the reductions used previously in [49]. Moroever, we 
show how this technique allows us to achieve improved running times for matroid intersection and 
minimum cost submodular flow. (See Tables 2, 3, 4, and 5 for running time summaries.) 


Authors 

Years 

Running times 

Nesterov, Nemirovsky[89] 

1992 

0[^/m{nmR + 'nR~^w?)) 

Anstreicher [7] 

2000 

0((mn)^/^(nm‘^ + n‘^“^m^)) 

Krishnan, Mitchell [70] 

2003 

0{n{n‘^ + + S)) (dual SDP) 

This paper 

2015 

0{n{'n? + mR + S)) 


Table 2: Algorithms for solving a m x m SDP with n constraints and S non-zero entries 


Authors 

Years 

Complexity 

Edmonds [26] 

1968 

not stated 

Aigner, Dowling [2] 

1971 

©(nr^TInd) 

Tomizawa, Iri [102] 

1974 

not stated 

Lawler [72] 

1975 

©(nr^TInd) 

Edmonds [28] 

1979 

not stated 

Cunningham [21] 

1986 

0(nri-^7rnd) 

This paper 

2015 

0{'n? log nTind + log*^^^^ n) 

0{nr log^ nT^ank + log*^*-^^ n) 


Table 3: Algorithms for (unweighted) matroid intersection, n is the size of the ground set, r is the 
maximum rank of the two matroids, Tlnd is the time to check if a set is independent (membership 
oracle), and Tr »n\ < is the time to compute the rank of a given set (rank oracle). 
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Authors 

Years 

Running times 

Edmonds [26] 

1968 

not stated 

Tomizawa, Iri [102] 

1974 

not stated 

Lawler [72] 

1975 

0(nr^7Ind + nr'^) 

Edmonds [28] 

1979 

not stated 

Frank [33] 

1981 

0{n‘^r{Tcircmt + n)) 

Orlin, Ahuja [91] 

1983 

not stated 

Brezovec, Cornuejols, Glover [14] 

1986 

0{nr {Tcircnit + r- + logn)) 

Fujishige, Zhang [39] 

1995 

0{n^r^'^ log rM • 7Ind) 

Shigeno, Iwata [96] 

1995 

0{{n + Tcircmt)nr^-^ log rM) 

This paper 

2015 

0{n? log iiMTind + log^^^^ nM) 

0{nr log n log nMTr&ak + log*^*-^^ nM) 


Table 4: Algorithms for weighted matroid intersection. In addition to the notation in Table 3 
T^ircuit is the time needed to find a fundamental circuit and M is the bit complexity of the weights. 


Authors 

Years 

Running times 

Fujishige [35] 

1978 

not stated 

Grotschel, Lovasz, Schrijver[49] 

1981 

weakly polynomial 

Zimmermann [113] 

1982 

not stated 

Barahona, Cunningham [12] 

1984 

not stated 

Cunningham, Frank [22] 

1985 

^ Oin^hlogC) 

Fujishige [36] 

1987 

not stated 

Frank, Tardos [34] 

1987 

strongly polynomial 

Cui, Fujishige [108] 

1988 

not stated 

Fujishige, Rock, Zimmermann [38] 

1989 

—7- 0{n^hlogn) 

Chung, Tcha [18] 

1991 

not stated 

Zimmermann [114] 

1992 

not stated 

McCormick, Ervolina [82] 

1993 

0{n^h* lognCU) 

Wallacher, Zimmermann [109] 

1994 

0{n^hlognCU) 

Iwata [52] 

1997 

0{n^h\og U) 

Iwata, McCormick, Shigeno [57] 

1998 

0 {n^h min |log nC, log n|) 

Iwata, McCormick, Shigeno [58] 

1999 

0 (n^/imin jlognt/, log n|) 

Fleischer, Iwata, McCormick[32] 

1999 

0 {n^h min jlog U, log n|) 

Iwata, McCormick, Shigeno [59] 

1999 

O [n^h min {log C, log n|) 

Fleischer, Iwata [30] 

2000 

0{mn^ log nil ■ EO) 

This paper 

2015 

0{n^ log nCU ■ EO -|- log*^*-^^ nCU) 


Table 5: Algorithms for minimum cost submodular flow with n vertices, maximum cost C and 
maximum capacity U. The factor h is the time for an exchange capacity oracle, h* is the time 
for a “more complicated exchange capacity oracle,” and EO is the time for evaluation oracle of 
the submodular function. The arrow, —>■, indicates that it uses the current best submodular flow 
algorithm as subroutine which was non-existent at the time of the publication. 
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2.3 Submodular Function Minimization 


In Part III we consider the problem of submodular minimization, a fundamental problem in com¬ 
binatorial optimization with many diverse applications in theoretical computer science, operations 
research, machine learning and economics. We show that by considering the interplay between the 
guarantees of our cutting plane algorithm and the primal-dual structure of submodular minimiza¬ 
tion we can achieve improved running times in various settings. 

First, we show that a direct application of our method yields an improved weakly polynomial 
time algorithm for submodular minimization. Then, we present a simple geometric argument 
that submodular function can be solved with 0{rfi\ogn ■ EO) oracle calls but with exponential 
running time. Finally, we show that by further studying the combinatorial structure of submodular 
minimization and a modification to our cutting plane algorithm we can obtained a fully improved 
strongly polynomial time algorithm for submodular minimization. We summarize the improvements 
in Table 6. 


Authors 

Years 

Running times 

Remarks 

Grotschel, Lovasz, 
Schrijver [49, 50] 

1981,1988 

d{n^ ■ EO + n^) [81] 

first weakly 
and strongly 

Cunningham [20] 

1985 

0{Mn^ lognM-EO) 

first combin. pseudopoly 

Schrijver [93] 

2000 

0(n^ • EO -|- n^) 

first combin. strongly 

Iwata, Fleischer, 
Fujishige[56] 

2000 

0(n^ • EO logM) 

0(n^ logn • EO) 

first combin. strongly 

Iwata, Fleischer [31] 

2000 

0(V • EO -h n«) 


Iwata [54] 

2003 

0((n4-EO + n^) log M) 

0((n® • EO -|- nJ) logn) 

current best weakly 

Vygen [107] 

2003 

0{n^ ■ EO -|- n^) 


Orlin [90] 

2007 

0(n^ • EO -h n^) 

current best strongly 

Iwata, Orlin [60] 

2009 

0((n^ • EO -hn^) log nM) 

0((n® • EO -|- n®) logn) 


Our algorithms 

2015 

0(n^ log nM • EO -|- n^ log^^^^ nM) 
0(n^ log^ n • EO -|- n^ log^*-^^ n) 



Table 6: Algorithms for submodular function minimization. 


3 Preliminaries 

Here we introduce notation and concepts we use throughout the paper. 

3.1 Notation 

Basics: Throughout this paper, we use vector notation, e.g x = (xi,... ,x„), to denote a vector 
and bold, e.g. A, to denote a matrix. We use nnz(x) or nnz(A) to denote the number of nonzero 
entries in a vector or a matrix respectively. Frequently, for x G IR'^ we let X G denote diag(x), 
the diagonal matrix such that Xjj = Xj. For a symmetric matrix, M, we let diag(M) denote the 
vector corresponding to the diagonal entries of M, and for a vector, x, we let ||x||m Vx^Mx. 


















Running Times: We typically use XO to denote the running time for invoking the oracle, where X 
depends on the type of oracle, e.g., SO typically denotes the running time of a separation oracle, EO 
denotes the running time of an evaluation oracle, etc. Furthermore, we use 0{f) '= 0{f /). 

Spectral Approximations: For symmetric matrices N, M G we write N ^ M to de¬ 
note that af^Nx < for all x G R" and we define N ^ M, N ^ M and N ^ M analogously. 

Standard Convex Sets: We let Bp{r) {x : ||a?||p < denote a ball of radius r in the ^p 
norm. For brevity we refer to B 2 {r) as a a ball of radius r and Boo{r) as a box of radius r. 

Misc: We let uj < 2.373 [110] denote the matrix multiplication constant. 

3.2 Separation Oracles 

Throughout this paper we frequently make assumptions about the existence of separation oracles 
for sets and functions. Here we formally define these objects as we use them throughout the paper. 
Our definitions are possibly non-standard and chosen to handle the different settings that occur in 
this paper. 

Definition 1 (Separation Oracle for a Set). Given a set A C R”" and 5 > 0, a 6-separation oracle 
for A is a function on R” such that for any input x G R"^, it either outputs “successful” or a half 
space of the form H = {z : cFz < cFx + b} ^ K with b < < 5 ||cj |2 and c We let SOs{K) be the 
time complexity of this oracle. 

For brevity we refer to a 0-separation oracle for a set as just a separation oracle. We refer to 
the hyperplanes defining the halfspaces returned by a 5-separation oracle as oracle hyperplanes. 

Note that in Definition 1 we do not assume that A is convex. However, we remark that it is 
well known that there is a separation oracle for a set if and only if it is convex and that there is a 
6 separation oracle if and only if the set is close to convex in some sense. 

Definition 2 (Separation Oracle for a Function). For any convex function /, r/ > 0 and 5 > 0, a 
( 77 , 5)-separation oracle on a convex set F for / is a function on R*^ such that for any input x G F, 
it either asserts /(x) < min^^gr f{y) + t? or outputs a half space H such that 

{F G r : f{z) < /(x)} C H = {z : (Fz < Fx + b} (3.1) 

with b < 5||cj| and c 7 ^ 0. We let SOp^sif) be the time complexity of this oracle. 
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Part I 


A Faster Cutting Plane Method 

4 Introduction 

Throughout Part I we study the following feasibility problem: 

Definition 3 (Feasibility Problem). Given a separation oracle for a set ii' C IR” contained in a box 
of radius R either find a point x € K or prove that K does not contain a ball of radius e. 

This feasibility problem is one of the most fundamental and classic problems in optimization. 
Since the celebrated result of Yudin and Nemirovski [112] in 1976 and Khachiyan [65] in 1979 
essentially proving that it can be solved in time 0(poly(n) • SO • log(ii/e)), this problem has served 
as one of the key primitives for solving numerous problems in both combinatorial and convex 
optimization. 

Despite the prevalence of this feasibility problem, the best known running time for solving this 
problem has not been improved in over 25 years. In a seminal paper of Vaidya in 1989 [103], he 
showed how to solve the problem in 0(n-SO Tog(ni?/e)+n‘^'*'^ log{nR/e)) time. Despite interesting 
generalizations and practical improvements [5, 92, 43, 10, 44, 87, 111, 45, 15], the best theoretical 
guarantees for solving this problem have not been improved since. 

In Part I we show how to improve upon Vaidya’s running time in certain regimes. We provide 
a cutting plane algorithm which achieves an expected running time of 0{n ■ SO • log(ni?/e) + 
log‘^(^)(ni?/e)), improving upon the previous best known running time for the current known 
value of a; < 2.373 [110, 41] when R/e = 0(poly(n)). 

We achieve our results by the combination of multiple techniques. First we show how to use 
techniques from the work of Vaidya and Atkinson to modify Vaidya’s scheme so that it is able to 
tolerate random noise in the computation in each iteration. We then show how to use known numer¬ 
ical machinery [104, 99, 76] in combination with some new techniques (Section 7.1 and Section 7.2) 
to implement each of these relaxed iterations efficiently. We hope that both these numerical tech¬ 
niques as well as our scheme for approximating complicated methods, such as Vaidya’s, may find 
further applications. 

While our paper focuses on theoretical aspects of cutting plane methods, we achieve our results 
via the careful application of practical techniques such as dimension reduction and sampling. As 
such we hope that ideas in this paper may lead to improved practical^ algorithms for non-smooth 
optimization. 

4.1 Previous Work 

Throughout this paper, we restrict our attention to algorithms for the feasibility problem that have 
a polynomial dependence on SO, n, and log(i?/e). Such “efficient” algorithms typically follow the 
following iterative framework. First, they compute some trivial region D that contains K. Then, 
they call the separation oracle at some point x G D. If x £ K the algorithm terminates having 
successfully solved the problem. If x ^ K then the separation oracle must return a half-space 

^Although cutting plane methods are often criticized for their empirical performance, recently, Bubeck, Lee and 
Singh [15] provided a variant of the ellipsoid method that achieves the same convergence rate as Nesterov’s accelerated 
gradient descent. Moreover, they provided numerical evidence that this method can be superior to Nesterov’s accel¬ 
erated gradient descent, thereby suggesting that cutting plane methods can be as aggressive as first order methods if 
designed properly. 
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Year 

Algorithm 

Complexity 

1979 

Ellipsoid Method [97, 112, 65] 

0(re^SO log K + n'^ log k) 

1988 

Inscribed Ellipsoid [66, 88] 

0(nSO log K + {n log 

1989 

Volumetric Center [103] 

0(nSO log K + log k) 

1995 

Analytic Center [10] 

/ nSO log^ K + log^ k \ 

-h(relogK)^+‘^/^ ) 

2004 

Random Walk [13] 

—)• 0(nSO log K + n'^ log k) 

2013 

This paper 

0(nSO log K + n^ log^^^^ n) 


Table 7: Algorithms for the Feasibility Problem, k indicates nR/e. The arrow, —>■, indicates that 
it solves a more general problem where only a membership oracle is given. 


containing K. The algorithm then uses this half-space to shrink the region while maintaining 
the invariant that AT C kl. The algorithm then repeats this process until it finds a point x G AT or 
the region becomes too small to contain a ball with radius e. 

Previous works on efficient algorithms for the feasibility problem all follow this iterative frame¬ 
work. They vary in terms of what set they maintain, how they compute the center to query the 
separation oracle, and how they update the set. In Table 7, we list the previous running times for 
solving the feasibility problem. As usual SO indicates the cost of the separation oracle. To simplify 
the running times we let k = nR/e. The running times of some algorithms in the table depend on 
R/e instead of nR/e. However, for many situations, we have log(A/e) = 0(log(nA/e)) and hence 
we believe this is still a fair comparison. 

The first efficient algorithm for the feasibility problem is the ellipsoid method, due to Shor [97], 
Nemirovksii and Yudin [112], and Khachiyan [65]. The ellipsoid method maintains an ellipsoid as 
and uses the center of the ellipsoid as the next query point. It takes 0(n^logAc) calls of oracle 
which is far from the lower bound II(nlogK) calls [86]. 

To alleviate the problem, the algorithm could maintain all the information from the oracle, i.e., 
the polytope created from the intersection of all half-spaces obtained. The center of gravity method 
[77] achieves the optimal oracle complexity using this polytope and the center of gravity of this 
polytope as the next point. However, computing center of gravity is computationally expensive and 
hence we do not list its running time in Table 7. The Inscribed Ellipsoid Method [66] also achieved 
an optimal oracle complexity using this polytope as 11 but instead using the center of the maximal 
inscribed ellipsoid in the polytope to query the separation oracle. We listed it as occurring in year 
1988 in Table 7 because it was [88] that yielded the first polynomial time algorithm to actually 
compute this maximal inscribed ellipsoid for polytope. 

Vaidya [103] obtained a faster algorithm by maintaining an approximation of this polytope and 
using a different center, namely the volumetric center. Although the oracle complexity of this 
volumetric center method is very good, the algorithm is not extremely efficient as each iteration 
involves matrix inversion. Atkinson and Vaidya [10] showed how to avoid this computation in 
certain settings. However, they were unable to achieve the desired convergence rate from their 
method. 

Bertsimas and Vempala [13] also gives an algorithm that avoids these expensive linear algebra 
operations while maintaining the optimal convergence rate by using techniques in sampling convex 
sets. Even better, this result works for a much weaker oracle, the membership oracle. However, 
the additional cost of this algorithm is relatively high in theory. We remark that while there are 
considerable improvemenst on the sampling techniques [79, 63, 76], the additional cost is still quite 
high compared to standard linear algebra. 
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4.2 Challenges in Improving Previous Work 

Our algorithm builds upon the previous fastest algorithm of Vaidya [105]. Ignoring implementation 
details and analysis, Vaidya’s algorithm is quite simple. This algorithm simply maintains a polytope 
pik) _ g ipn . — 6 > 0} as the current P and uses the volumetric center, the minimizer of 

the following volumetric barrier function 

argmin - logdet (A^S^^A) where diag(Ax — 6) (4-1) 

as the point at which to query the separation oracle. The polytope is then updated by adding shifts 
of the half-spaces returned by the separation oracle and dropping unimportant constraints. By 
choosing the appropriate shift, picking the right rule for dropping constraints, and using Newton’s 
method to compute the volumetric center he achieved a running time of 0{n-SO-log log k). 

While Vaidya’s algorithm’s dependence on SO is essentially optimal, the additional per-iteration 
costs of his algorithm could possibly be improved. The computational bottleneck in each iteration 
of Vaidya’s algorithm is computing the gradient of log det which in turn involves computing the 
leverage scores (t{x) '= diag(S“^A (A^S“^A) A^S~^), a commonly occurring quantity in nu¬ 

merical analysis and convex optimization [99, 19, 78, 76, 75]. As the best known algorithms for 
computing leverage scores exactly in this setting take time 0(n‘^), directly improving the running 
time of Vaidya’s algorithm seems challenging. 

However, since an intriguing result of Spielman and Srivastava in 2008 [99], it has been well 
known that using Johnson-Lindenstrauss transform these leverage scores can be computed up to 
a multiplicative (1 ± e) error by solving 0{e~^\ogn) linear systems involving A^S“^A. While 
in general this still takes time 0{e~‘^n^), there are known techniques for efficiently maintaining 
the inverse of a matrix so that solving linear systems take amortized O(n^) time [104, 75, 76]. 
Consequently if it could be shown that computing approximate leverage scores sufficed, this would 
potentially decrease the amortized cost per iteration of Vaidya’s method. 

Unfortunately, Vaidya’s method does not seem to tolerate this type of multiplicative error. If 
leverage scores were computed this crudely then in using them to compute approximate gradients 
for (4.1), it seems that any point computed would be far from the true center. Moreover, without 
being fairly close to the true volumetric center, it is difficult to argue that such a cutting plane 
method would make sufficient progress. 

To overcome this issue, it is tempting to directly use recent work on improving the running time 
of linear program [75] . In this work, the authors faced a similar issue where a volumetric, i.e. log det, 
potential function had the right analytic and geometric properties, however was computational 
expensive to minimize. To overcome this issue the authors instead computed a weighted analytic 
center: 

arg min — Wi log Sj (x) where s(x) = Ax — b . 

^ ie[m] 

For carefully chosen weights this center provides the same convergence guarantees as the volumetric 
potential function, while each step can be computed by solving few linear systems (rather than 
forming the matrix inverse). 

Unfortunately, it is unclear how to directly extend the work in [75] on solving an explicit 
linear program to the feasibility problem specified by a separation oracle. While it is possible 
to approximate the volumetric barrier by a weighted analytic center in many respects, proving 
that this approximation suffices for fast convergence remains open. In fact, the volumetric barrier 
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function as used in Vaidya’s algorithm is well approximated simply by the standard analytic center 

arg min — log Si (x) where s(x) Ax — b . 

^ -^r 1 

as all the unimportant constraints are dropped during the algorithm. However, despite decades of 
research, the best running times known for solving the feasibility problem using the analytic center 
are Vaidya and Atkinson algorithm from 1995 [10]. While the running time of this algorithm could 
possibly be improved using approximate leverage score computations and amortized efficient linear 
system solvers, unfortunately at best, without further insight this would yield an algorithm which 
requires a suboptimal 0(n log^^^^ k) queries to the separation oracle. 

As pointed out in [10], the primary difficulty in using any sort of analytic center is quantifying 
the amount of progress made in each step. We still believe providing direct near-optimal analysis of 
weighted analytic center is a tantalizing open question warranting further investigation. However, 
rather than directly address the question of the performance of weighted analytic centers for the 
feasibility problem, we take a slightly different approach that side-steps this issue. We provide a 
partial answer that still sheds some light on the performance of the weighted analytic center while 
still providing our desired running time improvements. 

4.3 Our Approach 

To overcome the shortcoming of the volumetric and analytic centers we instead consider a hybrid 
barrier function 

arg min— t(;jlogSi(x)-|-logdet(A^S“^A) where s{x)'=Ax — b . 

^ ie[m] 

for careful chosen weights. Our key observation is that for correct choice of weights, we can compute 
the gradient of this potential function. In particular if we let uJ = r — f?(x) then the gradient of 
this potential function is the same as the gradients of X]ie[m] T log Si(x), which we can compute 
efficiently. Moreover, since we are using logdet, we can use analysis similar to Vaidya’s algorithm 
[103] to analyze the convergence rate of this algorithm. 

Unfortunately, this is a simple observation and does not immediately change the problem sub¬ 
stantially. It simply pushes the problem of computing gradients of log det to computing w. There¬ 
fore, for this scheme to work, we would need to ensure that the weights do not change too much 
and that when they change, they do not significantly hurt the progress of our algorithm. In other 
words, for this scheme to work, we would still need very precise estimates of leverage scores. 

However, we note that the leverage scores d{x) do not change too much between iterations. 
Moreover, we provide what we believe is an interesting technical result that an unbiased estimates 
to the changes in leverage scores can be computed using linear system solvers such that the total 
error of the estimate is bounded by the total change of the leverage scores (See Section 7.1). Using 
this result our scheme simply follows Vaidya’s basic scheme in [103], however instead of minimizing 
the hybrid barrier function directly we alternate between taking Newton steps we can compute, 
changing the weights so that we can still compute Newton steps, and computing accurate unbiased 
estimates of the changes in the leverage scores so that the weights do not change adversarially by 
too much. 

To make this scheme work, there are two additional details that need to be dealt with. First, 
we cannot let the weights vary too much as this might ultimately hurt the rate of progress of our 
algorithm. Therefore, in every iteration we compute a single leverage score to high precision to 
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control the value of Wi and we show that by careful choice of the index we can ensure that no weight 
gets too large (See Section 7.2). 

Second, we need to show that changing weights does not affect our progress by much more than 
the progress we make with respect to logdet. To do this, we need to show the slacks are bounded 
above and below. We enforce this by adding regularization terms and instead consider the potential 
function 

ps{x) = - ^ Wi\ogSi{x) + - logdet (A^S“^A + Al) + -||x ||2 

ie[m] 

This allows us to ensure that the entries of s{x) do not get too large or too small and therefore 
changing the weighting of the analytic center cannot affect the function value too much. 

Third, we need to make sure our potential function is convex. If we simply take w = f — ff{x) 
with r as an estimator of d{x), w can be negative and the potential function could be non-convex. 
To circumvent this issue, we use w = Ce + t — a{x) and make sure ||t — i?(T)|| < Cg. 

Combining these insights, using efficient algorithms for solving a sequence of slowly changing 
linear systems [104, 75, 76], and providing careful analysis ultimately allows us to achieve a running 
time of 0{nS0 log log*^^^^ k) for the feasibility problem. Furthermore, in the case that K does 
not contain a ball of radius e, our algorithm provides a proof that the polytope does not contain a 
ball of radius e. This proof ultimately allows us to achieve running time improvements for strongly 
polynomial submodular minimization in Part III. 

4.4 Organization 

The rest of Part I is organized as follows. In Section 5 we provide some preliminary information 
and notation we use throughout Part I. In Section 6 we then provide and analyze our cutting plane 
method. In Section 7 we provide key technical tools which may be of independent interest. 

5 Preliminaries 

Here we introduce some notation and concepts we use throughout Part I. 

5.1 Leverage Scores 

Our algorithms in this section make extensive use of leverage scores, a common measure of the 
importance of rows of a matrix. We denote the leverage scores of a matrix A G by i? G 

and say the leverage score of row i G [n] is ai = [A (A^A) A^]jj. For A G d G R”g, and 

D “= diag((i) we use the shorthand cJA(d) to denote the leverage scores of the matrix D^/^A. We 
frequently use well known facts regarding leverage scores, such as dj G [0,1] and ||d||j^ < d. (See 
[99, 80, 78, 19] for a more in-depth discussion of leverage scores, their properties, and their many 
applications.) In addition, we make use of the fact that given an efficient linear system solver of 
A^A we can efficiently compute multiplicative approximations to leverage scores (See Dehnition 4 
and Lemma 5 below). 

Definition 4 (Linear System Solver). An algorithm S is a LO-time solver of a PD matrix M G R"'^"- 
if for all 6 G R*^ and e G (0,1/2], the algorithm outputs a vector S(6, e) G R"' in time 0(LOTog(e“^)) 
such that with high probability in n, ||s(6, e) — 1^ < e||M-^6||^. 

Lemma 5 (Computing Leverage Scores [99]). Let A G R"'^'^, let a denote the leverage scores of A, 
and let e > 0. If we have a LO-time solver for A^A then in time 0((nnz(A) -|- LO)e“^ log(e“^)) 
we can compute r G R”" such that with high probability in d, (1 — e)cjj < r* < (1 + e)iTj for all i G [n]. 
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5.2 Hybrid Barrier Function 

As explained in Section 4.3 our cutting plane method maintains a polytope P = {x G IR” : Ax > b} 
for A G and 6 G R” that contains some target set K. We then maintain a minimizer of the 

following hybrid barrier function: 

pg{x) = - ^ (ce + ei)logSi(f) + ^logdet(A^S"^A + Al) + ^||x ||2 

i£[m] 

where e G R”^ is a variable we maintain, Cg > 0 and A > 0 are constants we fix later, s{x) Ax — b, 
and Sx = diag(s(x)). When the meaning is clear from context we often use the shorthand A^, '= 

S-iA. 

Rather than maintaining e explicitly, we instead maintain a vector r G R”* that approximates 
the leverage score 

= diag (^A,e (A^Aj, + AI) ^A^^ . 

Note that ipi^) is simply the leverage scores of certain rows of the matrix 

Ax 

^/AI ■ 

and therefore the usual properties of leverage scores hold, i.e. 'ipi{x) G (0,1) and ||?/)j(x)||< n. 
We write ^jJ{x) equivalently as or tpp when we want the matrix to be clear. Furthermore, 
we let "= diag(V^(x)) and p{x) mmi^pi(x). Finally, we typically pick e using the function 
ep{T,x) = f — ijj{x). Again, we use the subscripts of A^, and P interchangeably and often drop 
them when the meaning is clear from context. 

We remark that the last term 2||^||2 oiisures that our point is always within a certain region 
(Lemma 23) and hence the term (ce A e^) log Si(x)j never gets too large. However, this term 
changes the Hessian of the potential function and hence we need to put a AI term inside both 
the logdet and the leverage score to reflect this. This is the reason why we use '0 instead of the 
standard leverage score. 

6 Our Cutting Plane Method 

In this section we develop and prove the correctness of our cutting plane method. We use the 
notation introduced in Section 3 and Section 5 as well as the technical tools we introduce in 
Section 7. 

We break the presentation and proof of correctness of our cutting plane methods into multiple 
parts. First in Section 6.1 we describe how we maintain a center of the hybrid barrier function 
pg and analyze this procedure. Then, in Section 6.2 we carefully analyze the effect of changing 
constraints on the hybrid barrier function and in Section 6.3 we prove properties of an approximate 
center of hybrid barrier function, which we call a hybrid center. In Section 6.4 we then provide 
our cutting plane method and in Section 6.5 we prove that the cutting plane method solves the 
feasibility problem as desired. 

6.1 Centering 

In this section we show how to compute approximate centers or minimizers of the hybrid barrier 
function for the current polytope P = {x : Ax > b}. We split this proof up into multiple parts. 
First we simply bound the gradient and Hessian of the hybrid barrier function, pg, as follows. 
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Lemma 6. For f{x) ^ logdet + Al), we have that 

V/(x) = — A!^?^(x) and A!^’J'(x)A 3 ; ^ V^/(x) ^ 3Aj’J'(x)Aa; 


Proof. Our proof is similar to [4, Appendix] which proved the statement when A = 0. This case 
does not change the derivation significantly, however for completeness we include the proof below. 

We take derivatives on s first and then apply chain rule. Let f{s) = ^ logdet (A^S“^A + Al). 
We use the notation Df{x)\h] to denote the directional derivative of / along the direction h 
at the point x. Using the standard formula for the derivative of logdet, i.e. ^logdetBj = 
Tr((Bt)-i(^)), we have 

Df{s)[h] = ^Tr((A^S-2A +AI)-^(A^(-2)S-3 hA)) (6.1) 

= _^^£rs-iA(A^S-2A + Al)~^AS-ii = -. 

. Si 

I I 

Applying chain rules, we have V/(x) = Now let P “= S”^A (A'^S”^A + Al) ^ A'^S”^. 

Taking the derivative of (6.1) again and using the cyclic property of trace, we have 


D^f{s)[hi,h 2 ] = Tr (^(A^S-^A + AI) ^ (A^(-2)S-=^H2 A) (A^S’^A + AI) ^ (A^S’^HiA 
-Tr (^(A'^S-^A + AI)“^ (A'^(-3)S-^H2 HiA 
= 3Tr (PS“2H2 Hi) - 2Tr (PS“^H2PS~1 Hi) 


i 


/ii(i)/i 2 (i) 


/ii(i)/i2(i) 








2^ p2h^h^ 




Si 


Consequently, D‘^f{x)[li, Ij] = [S“^ (3^^ — 2P(^)) where P^^^ is the Schur product of P with 

itself. 

Now note that 


= 1jS~^A(A^S"2A + AI) ^ A^S"2A(A^S“2a + AI) ^ A^S-Hj 

i 

< IjS^^A (A^S-^A + AI)"^ A^S“^lj = Pjj = . 


Hence, the Gershgorin circle theorem shows that the eigenvalues of — P^^^ are lies in union of 
the interval [0,2'ijjj] over all j. Hence, T' — P^^^ ^ 0. On the other hand, Schur product theorem 
shows that P^^^ ^ 0 as P ^ 0. Hence, the result follows by chain rule. □ 


Lemma 6 immediately shows that under our choice of e = ep(x, r) we can compute the gra¬ 
dient of the hybrid barrier function, pg{x) efficiently. Formally, Lemma 6 immediately implies the 
following: 

Lemma 7 (Gradient). For x € P = {y € : Ay> b} and e G IR™ we have 

Vpg{x) = — Aj(cel + e + 'ipp{x)) + Ax 
and therefore for all r G R'”, we have 

'^Pe{f,x){x) = - Al (^Cel + + Ax. 
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Remark 8 . To be clear, the vector ype{T,x){x) is defined as the vector such that 


\SPS(f,x){x)]i = Im - [Pe{T,x){x + tli 



In other words, we treat the parameter e(r, x) as fixed. This is the reason we denote it by subscript 
to emphasize that ps{x) is a family of functions, is one particular function, and Vps^^^g) 

means taking gradient on that particular function. 

Consequently, we can always compute Vpg(-f^^^)ix) efficiently. Now, we measure centrality or 
how close we are to the hybrid center as follows. 

Definition 9 (Centrality). For x G P = G IR”' : Ay > 6| and e G R™, we define the eentrality 
of X by 

6g{x) = ||Vye-(^)||jj(f)-i 

where H(T) = (Cel + x}) Ax + AI. Often, we use weights w G R>o to approximate this 

Hessian and consider Q(x, w) '= A^ (cgl + W) A^, + AI. 

Next, we bound how much slacks can change in a region close to a nearly central point. 

Lemma 10. Let x G P = |y G R” : Ay> 6| and y G R"' such that ||T — y||H(^) — e-x/cg + p{x) 
for e < 1. Then y G P and (1 — e)Sa; ^ Sy ^ (1 + e)Sa; . 

Proof. Direct calculation reveals the following: 




y/Ce + p{x) 


|Aa;(y - x)|| 


Cel+^'(s) 


< 


y/ce + p.{x) 


\v — ^ < e 

1^ IIH(x) — 


Consequently, (1 — 6 ) 83 , ^ ^ (1 + e)Sa;. Since y G P if and only if P 0 the result follows. □ 


Combining the previous lemmas we obtain the following. 

Lemma 11. Let T G P = {y G R” : Ay > b} and e,w G R™' such that ||e||^ < < 1 and 

T'(f) ^ W ^ Ify e satisfies ||^- < ^^/ce + hix), then 

^Q(x, u;) ^ V^p^(y) P 8 Q(x, uJ) and ^H(x) P H(y) ^ 2H(x) . 

Proof. Lemma 6 shows that 

A^(ceI + E + 'I'(y))Aj, + AI^ V 2 pg-.(y) ^ A^(ceI + E + 3'I'(y))Aj, + AI . ( 6 . 2 ) 


Since W P we have that Q(x, w) P H(T) and therefore ||^“27 ||h(^) — e-\/ce + p{x) with e = 0.1. 
Consequently, by Lemma 10 we have (1 — e)Sa; ^ Sj^ ^ (1 + e)Sa; and therefore 

|[^T'(f) P T^(y) ^ 

and 

^H(f) ^ |[^H(x) P H(y) P [^^H(f) ^ 2H(f) 
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Furthermore, (6.2) shows that 

'^‘^Peiy) ^ (cgl + E + 3'I'(?/)) Ay + AI 

(1 + 6)2 






^ 3 


(1 - 6)4 
(1 + 6)2 
(l-e)4^‘^ 
(1 + 6)2 


(Cel + E + 3^{x)) + AI 


A_(, ( -Cel + 3W 1 A, + AI 


(1-6) 


Q(x,w) :< 8Q(x,w) 


and 


V2pe-(y) + A^ (Cel + E + 'l'(y)) Ay + AI 






|l _)_ + E + ^'(+)) Ax + AI 

|1^A^ + 5w I A. + AI 


6 ^ (1 


□ 


To analyze our centering scheme we use standard facts about gradient descent we prove in 
Lemma 12. 

Lemma 12 (Gradient Descent). Let / : IR*^ —>• IR he twice differentiable and Q G positive 

definite. Let xq G IR” and xi '= xq — ^Q~^V/(xo). Furthermore, let Xa = xq + a(xi — x) and 
suppose that fiQ V‘^f{xa) + LQ for all a G [0,1]. Then, 

1- ||V/(xi)||q_, <(!-£) ||V/(fo)||Q-i 

2. f{xi) > f{xo) - i;||V/(fo)||Q-i 
Proof. Integrating we have that 

V/(Ti) = V/(fo) + y V‘^f{xa){xi - xo)da = J (^Q - ^V‘^f{xa)^ Q~^Vf{xo)da 

Consequently, by applying Jensen’s inequality we have 


|V/(fi)| 


Q- 


Q - ^V2/(f„) ) Q-^Vf{xo)da 


< 

< 


Q - -V2/(f„) ) Q”^V/(fo) 


Q-i 

da 


Q- 


|Q ^'^^V/(a:0)||[Q_l/2('Q_lv2j(^^))Q-l/2]2 


Now we know that by assumption that 


0 + Q-^/2 fQ_ Iv^f(x^)) Q-V2 ^ 


L 


L 
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and therefore combining these (1) holds. 
Using the convexity of /, we have 


fixi) > f{xo) + {Vf{xo),xi-X q) 

> /(xo) - ||V/(fo)||Q-i||®i - ^0 


IQ 


and since ||xi — xo||q = xII^-^(^o)||q-i) (2) holds as well. 

Next we bound the effect of changing e on the hybrid barrier function pg(x). 
Lemma 13. For x £ P = {y £ : Ay >b}, e, f £ IR™', and w £ IR™q such that W ^ 

Proof. Direct calcnlation shows the following 


□ 


= II - Ax(Cel + / + ^p(x)) 

< ||VPe(^)||Q(^,^)-i + ||Aj(/-e)||Q(-^)_i 


(Formula for S/pjfx)) 
(Triangle inequality) 


< ||Vpe(x)||Q(-^_i + - V - ||A^(/- ^||(ata,)-i (Bound on Q(f,u;)) 

Y Ce + pyx) ^ 


1 


< ||Vp^(x)||q^^^^^_i + (Property of projection matrix) 

□ 


'Ce + p{x) 

where in the second to third line we nsed Q(x,uJ) P H(x) P (cg + /i(x))A^A, 
We now have everything we need to analyze our centering algorithm. 


Algorithm 1: = Centerin.g(x^^\ r, ca) 

Input: Initial point £ P = {y £ R”' : Ay> b}, Estimator of leverage scores £ R”' 
Input: Number of iterations r > 0, Accnracy of the estimator 0 < ca < O.OlCg. 

Given: < gCg < | where = e(rx^*^^). 

Given: <5^o)(xW) = ||Vpg-<o)(x(°))||jj(.(o))_i < i^\/ce +/r(x(o)). 

Compute w such that ’If(x^°^) ^ W ^ (See Lemma 5) 

Let Q Q{x^^\w). 
for k = 1 to r do 

^(fc) ._ f(fc-i) _ |Q“^Vp^fe-i)(x(^“^)). 

Sample G R"' s.t. 

E[A(^)] = fjj{x^^^) — 'if{x^^~^'>) and 
with high probability in n, 

IIAW - (v((fW) - ^(x(^"B ))||2 < CA||S“(i_i)(%fc) - %fc-i ))||2 (See Section 7.1) 

;f{k) -(fc-i) a(^). 

^k) ^(f(fc)^f(fc)). 

end 

Output: (afP), ) 
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Lemma 14. Let G P = {y G IR” : Ay > 6} and let G R™' such that ||e(r(*^\ < 

|ce < g. Assume that r is a positive integer, 0 < ca < O.OlCe and 6^o)(x^^^) < j^\/ce + 

With high probability in n, the algorithm Centering{x^^'> ,T^^\r,CA) outputs ) such that 

1 . 5g<.)(xW) < 2 (l - ^)''5eto)(x(°)). 

2 . > yg<o)(x(°)) - 8 ((5^0)(x(°)))^ . 

3. = ^0) and - ^°)||2 < i^ca- 


4- 


S-(o)(s(x(’’)) - s(f(°))) 


< TO- 


where e^'^'l = e{f ^'^'>, x^''^). 

Proof. Let rj = ||Vp^o) ||q_i• First, we use induction to prove that ||x^'’) — < 8r], 

||Vpg<r)(xW)||Q_i < (1 - ^Yv and \\e^^'l - ^°)||2 < ^ca for all r. 

Clearly the claims hold for r = 0. We now suppose they hold for all r < t and show that 

they hold for r = t + 1. Now, since ||x(*) — < 8r/, x(*+^) = xW — |Q“^Vp^t)(x*^*^), and 

||Vpg<o(xW)||Q_i < (1 - ^)^r/ < y, we have 

||f(i+i) _ -(o)||^ < ||fW _f(o)||^ + l||Vpg(t)(x(*^)||Q_i <9y. 

We will improve this estimate later in the proof to hnish the induction on ||x^^'’'^^ — but 

using this, y < O.Ol-y/ce + p(x(o)), and ||e^*^||o^ < — ^°^||oo + ll^^^lloo — we can invoke 

Lemma 11 and Lemma 12 and therefore 

i|Vp,^0(x(*+^))||Q_l < (l- ||VPe<p( 

By Lemma 13 we have 

||Vpg<t+i)(x('+^))||Q_i < (^1 - ||Vp,-<o(^^'^)||q-i 


Q- 


1-1 + 


i/ce + p(x( 0 )) 


(6.3) 


To bound II 2 , we note that Lemma 10 and the induction hypothesis ||x^*^ IIh(x(o)) 

1^0) _ < 8y shows that (1 — 0.1)S,^{o) P S^{t) P (1 + 0.1)S,^(o) and therefore 


< 


Q 




< 


1 


1 - 0 . 1 ' 

1 

1 - 0.1 


,{o)A 


xip _ f(i+l) 


< 


1 


ATa -2 A 

AO) 


8(1-0.1) Vce + /U(x(0)) 




Q- 


(6.4) 


Now since 


^i+i) _ g(t) ^ 


— ^(*+ 1 ) _ 


— if{x 


it) 
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Consequently, with high probability in n, 


S^(o(^x(‘+i) “ ^xW) 


< CA 

< 


•'a;(*) ' '"a;'.' 

CA 




8(1 - 0 . 1 ) Vce + /^(^(°))" 
where in the last line we nsed minjgj^j Wi > Since ca < O.OlCe, by (6.3), we have 

O.Oln 


||Vp,^t+i)(x(*+i))||Q_i < ||Vpg<o(fW)|| 


^ ( 1 - ^ ) l|VPe<*)(^ 


Q 


Q- 


-1 + 


8(l-0.1)(Ce+/x(x(0))) 


||VPe^ 0 (xW)|| , 


Furthermore, this implies that 


|f(m)_^(o)||^< 




k=0 


Q-1 1=0 


IV 64 
M = ■ 


Similarly, we have that 


||e-«+i)_e<«)||^<^ 


CA 


< 


8 (1 - 0 . 1 ) a/cg + /r(f(°)) 
Scat/ ^ 


1 - 


1 

64 
8 ca 


||Vp,-< 0 )(f(°))||Q_l 


(1 - 0.1)v^Ce + ^(x(0)) (1 - 0.1)A/Ce +/i(x(°)) ^ ^ ^ ^ 10^ 


where we used rj = || Vpg<o) (x^^^)|| q_i < || Vpg<o) (x^^^)||jj_i = Sg(o)(x^^^) and this hnishes the induc¬ 
tion on ||Vpg<t)(xW)||Q_i, ||xW — and — e*^'^^|| 2 - 

Hence, for all r. Lemma 11 shows that 


S^r)(x(^^) = ||Vpg-t.)(x(^))||jj(.M)_i < \/2||Vpg-<.)(xW)||jj(.(0))_i 


Using that we see that the expected change in function value is only dne to the 

change while taking centering steps and therefore Lemma 12 shows that 

E[pg<.)(x(’'))] >Pg<0)(x(°)) - g (l - ^) ||VPe<0)(x(°))||Q_i >Pe<0)(x(°)) -8 (^Vo)(^^°^)) ■ 

° k =0 ^ ' 


Finally, for (4), we note that 


s(x*^'’)) — s(x*^°^) 


^(r) _ .f(O) 

s(x(°)) 

2 



A^S' 


< 


( 0 )" 


V^/x(x(0)) -h Ce 


f (»■) _ f ( 0 ) 


Q-i “ 10' 


□ 
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6.2 Changing Constraints 

Here we bound the effect that adding or a removing a constraint has on the hybrid barrier function. 
Much of the analysis in this section follows from the following lemma which follows easily from the 
Sherman Morrison Formula. 


Lemma 15 (Sherman Morrison Formula Implications). Let B G invertible symmetric 

matrix and let a G IR"' he arbitrary vector satisfying < 1. The following hold: 


1 . 


(B ± aa"^) ^ = B-i T 


B ^gg^B ^ 

ldia^B~^a 


S. 0 ^ 


B ^ aa ^B ^ 

l±^B-ia 


^ a^B "D —1 

- l±^B-ia^ 


3. log det (B ± dtf'^ = In det B + In (l ± a^B ^a) . 

Proof. (1) follows immediately from Sherman Morrison [95]. (2) follows since aa^ is PSD, 


B-^aa^B-i 
1 ±^B-ia 


B-V 2 


B-V2aaTB-i/2 

1 ±a^B-ia 


B"V 2 


and yff" < \\y 


I for any vector y. (3) follows immediately from the Matrix Determinant Lemma. 

□ 


We also make use of the following technical helper lemma. 
Lemma 16. For A G and all a G R" we have 




- 1^2 
a 


iG\m 


Proof. We have by Cauchy Schwarz that 


if A (A^ A + AI) ^ o) “ < V'A [i] • A + Al) 


-1 


and consequently 


(lfA(A^A + Al) ^ 


is m 


-1 A2 

a 


is m 


Since 


^ (if A (A^A + AI) ^ a) =d^ (A'^A + AI) ^ A'^A (A^A + AI) \ 

< d^ (A'^A +Al)"^a, 


2 G m 


we have the desired result. 

We now bound the effect of adding a constraint. 


□ 
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Lemma 17. Let A e b£R^,T£ R^, andxG P= |y G K” : Ay > &}• Let A € r(™+i)x^ 

be A with a row Sm+i added, let b G IR™'+^ be the vector b with an entry b^+i added, and let 
P={yeR^:Ay>b}. Let Sm+i = - bm+i > 0, . 

Now, let V G IR™'+^ be defined so that Vm+i = 1+^ o.'^d for all i G [m] 


Vi = Ti - 


'^+i>a 


(A^A. 


+ AI) 


— 1 Om+l 
Sm+1 


1 2 


Then, the following hold 


• [Leverage Score Estimation] ep{v, x)m+i = 0 and e^{v,x)i = ep(r, x)i for all i G [m], 

• [Function Value Increase] Pg-p[ij^s){x) = Pgp(g^oc)i^) ~ Celns(x)m+i + ln(l + fja)- 

• [Centrality Increase] 6g_^g^^){x) < dgp(^g^g){x) + (ce + ^pa) \[^ + i’a- 


Proof. By (1) in Lemma 15, we have that for all i G [m] 


'4>p{x)i = fip{x)i - —— 
1 + Va 


(A^A. 


+ AI) 


— 1 Om+l 
Sm+1 


1 2 


and that 


'V^p(3^)m+l — V’a 


1 + t^g 


Ipa 

1 + V’g" 


Consequently [Leverage Score Estimation] holds. Furthermore, by (3) in Lemma 15 this then 
implies that [Function Value Change] holds. 

To bound the change in centrality note that by (2) in Lemma 15 we have that H ^ H~^. 
Therefore if let v' G IR"* be defined so that v[ = Vi for all i G [m] then by triangle inequality we 
have 


_ rp _ rp 

^e^{v,x){x) = ||A^(Cel + T)||jj-l < 11A,^ (Cgl + T) 11 


— ||A!^(Cel + t)||jj_i + 


Om+l 


Sm+1 


(Ce T 


H-i 




^ep{T,x){^') T ( Ce T ^ 


'Ipa 




a-m+i 


Sm+1 


+ ||Ai^(i/ -r) 


H-i 


IH- 


Now, since H ^ ^ (A^Aj, + Al) we have that 


ttm+l 

1 



/ V'g 

Sm+l 

H-i ~ \/ 

^m+1 

(A^A.+AI)-! 

^ hip) 


Since ^^/"^Ax (Aj^T^Aa;) ^ is a projection matrix, we have ^ ^ ^ A,(A^T^Aa.) ^A^^ 
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Aj;!! By Lemma 16, we have 

II AJ {t -v) 


2 . II ^ H|2 

H-1 < Ik -^^lU-i 


k, >/’(«). (1 + '*. + ■'I) 


— 1 Q-m+l \ 

Sm+1 j 


< 


1 


1 + 'Ipa 


^m+1 (Aj; Aj; + Al) 0,m+l \ 


'pa 


I + Pa 


Combining, we have that 


^ej7{v,x)i^) — ^ep(T,x){^) + ^Ce + ^ ^ ^ ^ 

< ^ep{ t,x) ix) + {Ce +Pa) ^+ Pa • 


Pa _|_ Pa 


H{x) 1 + pa 


□ 


We now bound the effect of removing a constraint. 


Lemma 18 (Removing a Constraint). Let A G b G R'”, r G R”^, and x G P = {y G 

R” : Ay > b}. Let A G r(™'-i)><’^ be A with row m removed, let b G R™"^ denote the first m — 1 
coordinates of b, and let P {y G R"^ : Ay > b}. Let p^ = pp{x)m- 

pm—1 


Now, let V G R™' ^ be defined so that for all i G [m — 1] 


Vi = ri + 


I-Pd 


Aa; (A^Aa; + Al) A!^1 


Assume pd < l.l/i(x) < and ||ep(r, x)||^ < Ce < 5 , we have the following: 

• [Leverage Score Estimation] ep{y,x)i = ep{T,x)i for all i G [m — 1], 

• [Function Value Decrease] (^ 7 , 5 -)(®) = Pep{T,x)ix) + [ce + ep(r,x)m] lns(x)m + ln(l - pd) 

• [Centrality Increase] %(a 7 ,a-)(^) < 'fjYr^Jfi^^ep{f,x)ix) + 3(ce + y{x)). 

Proof. By (1) in Lemma 15, we have that for all i £ [m — 1] 


Pp{x)i = pp{x)i + 


1 


i-pd 


if Aj; (Af Aa; + Al) ^ Af 1 


Consequently, [Leverage Score Estimation] holds. Furthermore, by (3) in Lemma 15, this then 
implies that [Function Value Change] holds. 

To bound the change in centrality we first note that by (1) and (2) in Lemma 15, we have that 
the approximate Hessian for P, denoted H(x), is bounded by 


= ( H(f) - Af (Cel + ^x)^^^ imtl (Cel + ^x)^^^ A, 


-1 




1 — a 


H(x) 


-1 
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where a = (cgl + ^ Aa;H(f) (cgl + ^xf ^ Im- Using Ce + /u(f) < ^ < 1, we 

have 

^ (Aj(ce +/i(f))Aa; + Al) ^ ^ (ce +/i(f))“^ (aJAj; + AI) \ (6.5) 


Using this, we have 
a < 


'' + ZA, (A^A, + AI)-‘ A^i„. = 


Ce + 

Ce + nix) 


V’d. 


^Ce + nix) , 

Now let f' G be defined so that t[ = r* for all z G [m — 1]. We have by above that 

= ||Al(Cel + U)||jj-l < ^-A_ ||A^(Cel + U)||^_i 
and therefore, by triangle inequality 


( 6 . 6 ) 


\^T 


Ax (Cel + U)||jj_i < IIA^, (Cel + 'r)||jj-i + II Aj, lm(Ce + rm)|| jj_i + 11 A,,, (f' — u) 

= ^ep(T,x)ix) + (Ce + rm)||A|^lm||jj_i + ||A|^(f' - U)||jj_i . 


IH- 


Now, (6.5) shows that 


Ia^^T II < 


Ail, 


< 




yc.+M*)" “ V ='+ 

Furthermore, since 'I' ^ Aj;!! ^AJ, by Lemma 16 we have 

II A^ (F' - v) ||^_i < ll'f' - ull^.i 


V'(f), {l-^Pd ^ 


ie[m] 


< 


1 


1 - V’d 


liAx (A^A. + AI)-^A^1 


'ipd 


I -Tpd 


Combining, we have that 


V’d , V’d 


%(«,^)(^) < -5e-p(Lx)(^) + (Ce + r,n)y ^ ^ 1 - 

Using the assumption -0^ < l.l^(x) < ||ep(T, x)||^ < Ce and (6.6), we have a < l.li^d < 

1.21/i(x) and < V’d + Ce, and 


^ep(v,x) ix) ^ 


1 


< 


- 1.3//(f) L 

%(r,^)(x) + 


^ep(T,x)ix) + (Ce + rm)\/ri + 1.2zirf 
1 


1 _ L3 
^ 10 


(\/ri • 2ce + ivn +1.2 • i.i)nix)^ 


□ 
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6.3 Hybrid Center Properties 


Here we prove properties of points near the hybrid center. First we bound the distance between 
points in the H(x) norm in terms of the (.2 norm of the points. 

Lemma 19. For A G 5 g |R"i g^ppQgg that x G P = {y : Ay > b} and e G IR”^ such that 

< |ce < ^ and Sg < 0.1 + /^(x). Then for all y G P we have 




12mce + Qn + 2\\\y\\l 
^Ce + tl{x) 


(6.7) 


and 

ll^lla < 4A“^(mCe + n) + 2 ||y ||2 . 

Proof. For notational simplicity let t = Cgl + e + 'ipx, T diag(t), and Q A^(cel + 
We have 


X — 


IaIta, 


E 

ie[m] 


[•Sx Syjj 


ie[m] 


E** + 




and 


( 6 . 8 ) 


2G m 


V < V max|^ < 




(l + ||S-i(s,-Sx)||J (6-9) 


and 


|Sx^ (4 - 4) Iloo - liS^^A(f-y) 

tG[m\ 

-1/2 


i iS[m] 


< \\x — 


Ih(x) 


max [Sx^AH(x)-iA^S 
ie[m] 




<{Ce + tl{x)) ^ F-y||H{i-) 


(6.10) 


Now, clearly X)ie[m] positive and since ||e||^ < |ce we know that E Ai^TA^ 

Therefore, by combining, (6.8), (6.9), and (6.10) we have 


IQ 


xllx-y||^ < IU 1 |i- I 

isM ^ VieH 

lx — 


3J/JI 


X — ' 


Ih(x) 


< 


Now since Vpg{x) = — A'^S^j, ^T1 + Ax we have 


ii + y] ** 


^y\i 


[sx]i j \JceP y{x) 

IIh(x) 


\iG\m\ 


F]i / y^Ce + /r(x) 


(x - y, Vp^(x)) = - y^ ti "" + Ax'^(x - y) 


2 1|H|2 


and therefore by Cauchy Schwarz and F y < x L + i 


2 ' 4l|y|l2 


yy = \\i\\^ - A||x ||2 + Ax^y+ (x-y,Vpe-(^)) 

,^r_i 

A II ^ii2 


^ INIl + 7ll^il2 + ll^-y|lH(x)'^e(®) 


iG\m 


( 6 . 11 ) 


( 6 . 12 ) 

(6.13) 
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Now, using (6.11), (6.13) and the definition of H(x), we have 


1||^ -l|2 

2lk- 


H(x) = ^ll*-y|lQ + ^ll^-y|l2 


< 


< 


ll + X] 




F 2/||h(^) a 11^ .|2 

+ w F - y o 


lie m 


[Sx]i j i/Ce + /U(f) 2 


' A/Ce + ^(f) " 

Using the fact that 6e{x) < 0.1 Y^c7+~y^^, we have 


+MkiiMiij- jiL„++hk-- 


i/Ce + n{x) 2 


1||- -l|2 




Ih(x)- 


lH(x)-inil ' 2"“' ""2' ^Ce + fi{x) 

Furthermore, since Ylie[m] ii[sy]i/[sx]i is positive, (6.12) shows that 

Xx^{x-y) = A||f ||2 - Xx^y < ||t]|^ + {x - y,Vpg{x)) < ||tj|^ + ||f - y||jj^^^5e(f) 


(6.14) 


and hence 


^||x-y||2 < ^||^-y||2 + ^ll^lla = + ^||y|l2 


2 ' 2 " "2 
^ ll^1ll + ^l|y|l2 + ll^-y|lH(x)'^F^) • 

Putting (6.15) into (6.14) and using the fact that Se{x) < O.l^Ce + /r(x), we have 


(6.15) 


-ilx- 


Uix) - 2||^1ll + n\\y\\l + ( + 


4"“' '’iiH(x) - -iriii ' 2 

Now, using ||t]|^ < 2mCe + n, we have 
1 


\\3 i±M£ 

^/Ce + p{x) 


\X — ' 


Ih(s)' 


ilF- 


Ih(x) — + Q^) If ^IIh(x) 


for a = 


2mce + n + 4 


A|LvI|2 


4 I|i'll2 




Since -^Ce + y(x) < 1.05, we have a > 0.9 and hence 

0.1 + a + Y^(tF+~0(T)2~+^a 


F-y\\H{x) < 


2 • 4 

^ 4 


< 6a 


yielding (6.7). 

We also have by (6.12) and the fact that Se(x) < O.ly^Ce + p{x), 


= INIi + ^^^y + -y^'^psi^)) - X] 

. r 1 J ^ 

iG[m\ 

^ IMIl + ^ll^ll" + ^I|y|l2 + P - y\\H{x)^eiS) 

^ IMIl + ^11^112 + ^llylla +0-Vce + y(®)||s - 


Ih(x) 
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Hence, using ||t]|^ < 2mce + n and x — y < 6a, we have 


FII 2 ^ ¥\\i + 2\\y\\2 + ^-^[‘^^^- + ^ + ^\\y 

< A||y ||2 + 2(mce + n). 


□ 


In the following lemma we show how we can write one hyperplane in terms of the others provided 
that we are nearly centered and show there is a constraint that the central point is close to. 

Lemma 20. Let A G and 6 G R™ such that ||aj ||2 = 1 for all i. Suppose that x £ P = {y : 

Ay > b} and e G R™ such that ||e|| < |ce < ^. Furthermore, let e = min^gj^] Sj{x) and suppose 

that i = argmin^gj^] Sj{x) then 


Si + Y^ 


s{x)i \ f Ce + ej + tpjix) 


^s{x)jj yce + ei + tfiix) ' ^ 
Proof. We know that 


< 


2e 


.Tc-l, 


= Ax- 

*e[m] 

Consequently, by < ^Ce, and 'ifi{x) > p,{x) 

I s{x)i\ f Ce +Cj P'lfjix) 


-f.\s{x)jj \Ce + eiP-iptix) 


< 


(Ce + /U(x)) 

A||X||2 

Cel + e + V’a; 

) + Ax 

(Ce + Ci + ^pi) ^ 

s{x)i 

UjI 

Si{x) 


(Ce + Cj + V’i(^)) 

2e 

(Allxll 


mce + n 


+ A 


A^S3,^(Cel + e + V’a;) 


fe +^ g )) bihh + ihf-.wiu 


Using ||aj|| = 1, Jf.i'f’i — Si{x) > e, we have 

Tr(A^(ceI+ A,) = Tr(A,A^(ceI + 


(Ce + V’i) 




sfix) 


- g2 


(6.16) 


Hence, we have H(x) ^ + A) I and ||Vpe (^)||2 < yielding the result. □ 


6.4 The Algorithm 

Here, we put all the results in the previous sections to get our ellipsoid algorithm. Below is a sketch 
of the pseudocode; we use Ca,Cd,Ce,CA to denote parameters we decide later. 

In the algorithm, there are two main invariants we maintain. First, we maintain that the 
centrality dp^g{x), which indicates how close x is to the minimum point of pg, is small. Second, we 
maintain that ||e(r, x)||^, which indicates how accurate the leverage score estimate r is, is small. 
In the following lemma we show that we maintain both invariants throughout the algorithm. 
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Algorithm 2: Our Cutting Plane Method 


Input: G 60) g e > 0, and radius i? > 0. 

Input: A separation oracle for a non-empty set K C BooiR)- 
Check: Throughout the algorithm, if Sj(xO)) < e output pO). 
Check: Throughout the algorithm, if g K, output x^). 

Set PO) 

Set xO) ;= 0 and compute = ipp(o)(x^^^)i for all i G [m] exactly, 
for fc = 0 to oo do 

Let TTT-O) be the number of constraints in pO). 

Compute rtiO) such that T'p(fe)(xO)) ^ w^) ^ (1 -|-CA)^p(fc) 
Let zO) G argmax,-, 




(k) (fc) 
w\ > -tI ’ 


Set and = rf'^ for all j + 

if min.g[^(fc)] wf^ < then 

Remove constraint with minimum yielding polytope p(^+^). 


Update r according to Lemma 18 to get r. 


(fc+ o ) 


else 


Use separation oracle at to get a constraint {x : oFx > with ||o ||2 = 1- 

Add constraint {x : oFx > oFx^^^ — ^oF+ AI)“^a} yielding 

polytope p(^+i). 


(fc+ o ) 


Update r according to Lemma 17 to get r. 


(-(fc+i)^-(fe+i)) ^ Centering(x(*^),r(*^+i),200,CA). 


end 


Lemma 21. Assume that Ce < Cd < CaFRa < Cd < Ca, and ca < Cce/log(n) for some 
small enough universal constant C. During our cutting plane method, for all k, with high probability 
in n, we have 


1 . ||e(f<'=+l),f('=))||^ < jijCe, ||e(r-<*+il,*W)L < ||€(r-(‘+‘),*l‘+‘))||^ < 


2. i 


PW,e 




Y^Ce -I- min (/r(x(*')), Crf). 


3. 6p(k+i)< ^Y^Ce -bmin (/i(x(*^+i)), c^). 

Proof. Some statements of the proof hold only with high probability in n; we omit mentioning this 
for simplicity. 

We prove by induction on k. Note that the claims are written in order consistent with the 
algorithm and proving the statement for k involves bounding centrality at the point Trivially 

we define, and note that the claims then hold for k = —1 as we 

compute the initial leverage scores, F^\ exactly and since the polytope is symmetric we have 
^e(T(o) f(o))(0) = 0. We now suppose they hold for all r < t and show that they hold for r = t. 

We first bound 6. For notational simplicity, let rjt "= F^e + min{//(x(*)), c^}. By the induction 
hypothesis we know that ^p(t)^e'(f(*),x(*))(^^*^) — Now, when we update to F^^i\ we set 
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e^(t) to 0. Consequently, Lemma 13 and the induction hypothesis ||e(r(*\ show that 




1 . m 

“ 400 ' 400 “ 200 


(6.17) 


Next, we estimate the 5 changes when we remove or add a constraint. 

For the case of removal, we note that it happens only if < min^ rcj < Also, 

the row we remove has leverage score at most l.l/x(x^*^) because we pick the row with minimum 
w. Hence, Lemma 18 and Cg < show that 


.< -7====^Km -/-U+4) + 2-7(cg + li(xW)) 


P(‘+i),e(T(*+3\x(‘))^ ^ _ PW,e{T^*+^\xWy 


< 


Vl - 2 • 10-6 V200/ 




m 

100 


where we used the fact < Cd and hence Cg + < y/ce + Cdrjt < 

For the case of addition, we note that it happens only if 2^(x^*)) > min* Wi> Cd- Furthermore, in 
this case the hyperplane we add is chosen precisely so that il^a = Ca- Furthermore, since Ce < Cd < Ca 
by Lemma 17 we have that 




(t+j) 



+ Ipa < — -h 4 Ca 

200 



Furthermore, since Ca^f^ < fj,{x^^^) > Cd/2, and < 10 ® we know that ^CasJCajCd < 

and consequently in both cases we have 5 y,! a, 

p{t+i) ^e{T^ 

Now, note that Lemmas 17 and 18 show that e does not change during the addition or removal 
of an constraint. Hence, we have ||e(r^*'’'3\< ||e(r^*'’'3),x(*))||^. Furthermore, we know 

the step = V’p(fe) oiily decreases ||ej|g^ and hence we have ||e(r^*“'"3),xW)||^ < 

||^r(f(0,f0))||^ < Thus, we have all the conditions needed for Lemma 14 and consequently 

/ 2 \ 1 
(5p{t+i)^g-(^{t+i)_^{t+i))(x( < 2 1^1 - — j '^p(t+i),e-(A‘+i),xW)^^*' - looo^* ■ 


Lemma 14 also shows that that 
Therefore, rjt < 2rit+i and thus 


s(x(*''>) 


< and hence for all i. 


6p{t+i) _e'(.f{t+i) ,^{t+l)) ) 


< 


^Jce + min (crf, //(f(*+i))) 


400 


completing the induction case for <5. 

Now, we bound ||e||^- Lemma 17 and 18 show that e does not change during the addition or 

removal of an constraint. Hence, e is affected by only the update step = V’p(fe)(x(*^^)i(fe)” and 

the centering step. Using the induction hypothesis _^^(r+ 2 ) < j^fjr and Lemma 14 

shows that and — e(r^'’^3), x (’'))||2 < for 
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all r < t. The goal for the update step is to decrease e by updating r. In Section 7.2, we give a 
self-contained analysis of the effect of this step as a game. In each round, the vector e is corrupted 
by some mean 0 and bounded variance noise and the problem is how to update e such that 11e|| 
is bounded. Theorem 34 shows that we can do this by setting the Ci = 0 for the almost maximum 
coordinate in each iteration. This is exactly what the update step is doing. Hence, Theorem 34 
shows that this strategy guarantees that after the update step, we have 


e(r 


(r-l-T 




0(CA log (n)) 


for all r < t. Now, by our choice of ca, we have ||e(r^*“'“3\Lemma 17 and 
18 show that e does not change during the addition or removal of an constraint. Hence, we 
have ||e(r^*'’' 3 \Now, we note that again Lemma 14 shows — 

e(r^*+3),< i^CA < X(^Ce, and we have This hnishes the 

induction case for e and proves this lemma. □ 

Next, we show the number of constraints is always linear to n. 

Lemma 22. Throughout our cutting plane method, there are at most 1 + ^ eonstraints. 

Proof. We only add a constraint if min* Wi > c^. Since 2'tpi > Wi, we have V’i ^ ^ for all i. Letting 
m denote the number of constraints after we add that row, we have n > > (m — l)(crf/2). □ 

Using iL 7 ^ 0 and K C Boo{R), here we show that the points are bounded. 

Lemma 23. During our Cutting Plane Method, for all k, we have < 6^/n/X + 2y/nR. 

Proof. By Lemma 21 and Lemma 19 we know that ||T ^^^||2 ^ 4A“^(mCe + n) + 2||y||2 for any 
y £ pW, Since our method never cuts out any ^oint in K and since K is nonempty, there is some 
y G iL C P^^\ Since K C Boo{R), we have ||y ||2 < nR. Furthermore, by Lemma 22 we have that 
mce < Ce + 2n < 3n yielding the result. □ 

Lemma 24. s, < 12y^n/A + A^/nR + ^^ for all i and k in the our cutting plane method. 

Proof. Let be the current point at the time that the constraint corresponding to Si, denoted 
{x : afx > afx^^') — Si{x^^^)}, was added. Clearly 


Si{x^^'^) = afx^^^ — ajx^^^ + < 




\x 


■(fc)l 


+ 




On the one hand, if the constraint for Si comes from the initial symmetric polytope P^^^ = i?oo(72), 
we know — Sj(x(-^^)| < R . On the other hand, if the constraint was added later then we 

know that 

Si{x^^^) = c“fo^Y^a^(A^S“2jA + AI)-ia < (caA)-^^ 

and — Sj(x^'^))| < ||aj|| • Since ||ai ||2 = 1 by design and ||^^■^^||2 

II 2 are upper bounded by 6^/n/X + 2^/nR by Lemma 23, in either case the result follows. □ 

Now, we have everything we need to prove that the potential function is increasing in expecta¬ 
tion. 
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Lemma 25. Under the assumptions of Lemma 21 if X = Ce = 6 in(i 7 n_R/£) ’ < c, 

then for all k we have 

EPe-(r(fc+l), 5 ?(fc+l))(x(^+^^) > -Crf + ln(l + /3) 

where f3 = Ca for the case of adding a constraint /3 = —Cd for the case of removal. 

Proo^. Note that there are three places which affect the function value, namely the update step for 
the addition/removal of constraints, and the centering step. We bound the effect of each 
separately. 

First, for the update step, we have 

= -ei(fc) log(SiW (#'=))) + 

Lemma 24, the termination condition and A = ensure that 

’ CaR^ 


e < Sj_(k) < 12y/njX + Ay/nR + J —- < 17y/nR 

V Ca A 

and Lemma 21 shows that \e^{k) \ < Cg. Hence, we have 

^ Pe(r(fc),a-W)(®^^^) “ Cg log(17ni?/e). 

For the addition step, Lemma 17 shows that 

- Cglns(x )^+1 + ln(l + Ca) 

> Pg^^^{k+^) - Cglog(17ni?/e) + ln(l + Ca) 

and for the removal step. Lemma 18 and \ei\ < Cg shows that 

+ ep{f,x)m\ Ins(f)™ + ln(l - Cd) 
- log(17nii/e) + ln(l - Cd) 

After the addition or removal of a constraint. Lemma 21 shows that 

1 


P('“),e(r(''+3\x('=))"'“ " — 100 

and therefore Lemma 14 and Cg < Cd show that 


< — ^Cg + min (/r(f(^)), Cd) 


Ep^(^(fe+i)_5j(fe+i))(f(^+^)) > p 






Y^Cg + min {id{x(’^)),Cd) 


100 




Combining them with Cg = Qx^i^i-f^R/e) ’ have 


(6.18) 


- 3cglog(17nii/e) - ^ + ln(l + /?) 
> 7'e(r(fc),x{'=))(®^^^) -Cd + ln(l + /?) 

where f3 = Ca for the case of addition and fd = —Cd for the case of removal. 


□ 
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Theorem 26. For Ca = , ca = , Ce = 6 in(mR/e) > ^ ^ 

enough universal eonstant C, then we have 

) — Pe{T(^') ) 2Q11 j^gll 

where (5 = 1 for the ease of addition and (3 = 0 for the case of removal. 

Proof. It is easy to see that these parameters satisfy the requirements of Lemma 25. □ 


6.5 Guarantees of the Algorithm 


In this section we put everything together to prove Theorem 31, the main result of this section, 
providing the guarantees of our cutting plane method. 

For the remainder of this section we assume that Ca = Q = Ce = 6 in(i 7 nR/e) ’ 

CA = log^fii) = PW' Consequently, throughout the algorithm we have 

||x ||2 < Gy^ n/X + 2^/nR = + 2y/nR < 3\/nR. (6.19) 

Lemma 27. If s some i and k during our Cutting Plane Method then 

max {di,y) — min {di,y) < -. 

yePWnB^iR) yeP('=)nBoo(iJ) CaCe 

Proof. Let y G P^^'l CiBaoiR) be arbitrary. Since y G B^oiR) clearly ||? 7||2 < nR^. Furthermore, by 
Lemma 22 and the choice of parameters mce + n < 3n. Consequently, by Lemma 19 and the fact 

|2 


that A = and Ca < 1 we have 


I II 12mce + 6n + 2 A|| 2 /|L 30n + 2^ 4n 

I " MH(a;) — 


y/ce + y{x) 


•y/ce + y-{x) C-a^fc, 


and therefore 


s-s(y)) 


1 

< — 
oo \ Cp 


-s(y)) 


< 


4n 


Cel+^ CaC^ 

Consequently, we have (1 — -^)si{x^^^) < Si{y) < (1 + -^)si{x^^^) for all y G n Boo{R). □ 

Now let us show how to compute a proof (or certificate) that the feasible region has small width 
on the direction Oj. 

Lemma 28. Suppose that during some iteration k fori = argmin^- Sj{x^^'l) we have Si{x^^^) < e. Let 
{x^,T^) = Centering{x^^\T^^\64:log{2R/e),Cj\) where is the r at that point in the algorithm 
and let 

- j. ^ /s(x*)A /ce + ej(f*,r*) + V’i(x*)' 

a = ytjOj where tj = ' ' ' 


s{x*)jj Vce + ei(f*,n) + V’i(x*) 


Then, we have that I Oi + a*|L < 

II 11 Z CaCexC 

and tj > 0 for all 

/0{n) \ 

. ^ 0{n) 

1 ^ 

tjbj < 




3n 
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Proof. By Lemma 14 and Lemma 21 we know that e(x*,r*) < ^Ce and 

Since e(x*,r*) < ^Ce, we have tj > 0 for all j. Furthermore, by Lemma 20 and (6.19), we then 
have that with high probability in n 


I Oj + a* 112 < 


2e 


(Ce + /U(f*)) 


|f*|L + <5e(®*) 


mce + n 


+ A 


2e 

< — 
Ce 

Ce 


1 , e 3n n 

{3^/nR) + + 


Cai?2 


RVe^ CaR? 


3y/n y/3n 2^/n 
CaR 4 ^/cfR 


Hence, we have 


Oj + a* L < 


8^/ne 


2 CnCf.R' 


By Lemma 21 we know that e(x*,r*) < 4ce and hence 


0(n) ^ 

, 0(n) 

0(n) 

OH . 


f* - ^ tjbj 


II 

C« 

M 

. y 


j¥=i 

jH ^ 



OH 

< Si(®*) ^ 

( §Ce + lfj{x^)\ 



\ce + ^/>j(x*) j 


Ce + ei(f*,T*) +'ij)i{x^) 
0(n) 


Ce + Cj (f *, T* ) + V'j ) 

3mce + 2n\ 


j¥=i 


3n 

, < — 
Ce / Cg 


□ 


Lemma 29. During our Cutting Plane Method, ifpg{x^^^) > nlog(^)+|^, then we have Si{x^^'^) < 
e for some i. 


Proof. Recall that 

Psix'^'"^) = - ^ (ce + ei)logSi(f(''))i + ^logdet A + Al) + 

i£[m] 


Using < 3y/nR (6.19) and A = - r^, we have 


Pe(^^^^) < 


^ (ce + Ci) log(s(f(^))i) + ^ logdet A + Al) 

ie[m] 


+ 


5re 

Ca ' 


Next, we note that ||ei||^ < Ce < i 2 in(i 7 nfl/e) ^ Vl^JnjX + ^^/nR + ^ < 6^/nR 

(Lemma 24). Hence, we have 

Pe(®^^^) < J logdet A + Al) + —. 

Z \ ' C-a 

Since pgfx^^^) > nlog(^) + we have ^ logdet ^A^Sy^^ A + AI^ > nlog(^). Using e < R, we 
2 2 

have that > \ + A and hence 

e 

^log Aj (A^S"^A + AI) > nlog + A^ . 
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Therefore, we have log Amax + Al) > log + A). Hence, we have some unit vector v 

such that vA'^S~'^Av + XiFv > ^ + A. Thus, 

(AfT). n 


Therefore there is some i such that > \. 

{ai,v)'^ > and hence s{x^^^)i < e. 


Since Si and v are unit vectors, we have 1 > 

□ 


Lemma 30. With constant probability, the algorithm ends in 10^^nlog( —) iterations. 


Proof. Theorem 26 shows that for all k 


EPe(r(fc+i),a-(fe+l))(®^^^^^) > ^’e(rW,xW) ^ (6-20) 

where /? = 1 for the case of adding a constraint and /3 = 0 for the case of removing a constraint. 
Now, for all t consider the random variable 




IQii 


3.5t 

IQii 


where is the number of constraints in iteration t of the algorithm. Then, since = 

mSl — 1 + 2/3, (6.20) shows that 


EXi+i - 


= X.- 


1 9/3 

+ 


IQii IQii 


1 , 9/3 4.5m(‘+^) 3.5(t + l) 

10^ 10^ HTI IQii 

4.5(-l + 2/3) 3.5 

IQii iQii “ *• 


Hence, it is a sub-martingale. Let r be the iteration the algorithm throws out error or outputs 
Optional stopping theorem shows that 


EXniin(.r,i) > EXq. (6-21) 

Using the diameter of P^^'^ is ^JnR, we have 

pg(d) = - CelogSj(O)^logdet (A^Sq^A-F AI)^||0||2 

> -Cem^°hog( y/nR) -b ^ log ^ 

> — -b log(-vAi.R)- 

Using Ce = and = 2n, we have 

Xo > - (^n -b log(\/ni?) - 

> -nlogZ-s/nR) — lOOre. 

"^We have made no effort on improving this constant and we believe it can be improved to less than 300 using 
techniques in [5, 6]. 
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Therefore, (6.21) shows that for all t we have 


- n{log{nR) + 100) < EXmin(r,0 

= pE [Xi„in(r,t)l'r < t] + (1 - p)E [Xi„in(r,t)l'r > t] 


( 6 . 22 ) 


where p = P(t < t). 
Note that 


E [Xi„in(r,t)k > i] < E 
< E 


Pe(rW,5-W)(^^*^)l'r > t 
Pe(rW,xW)(^^*^)l'r > t 


4.5mW 3.5t 

1011 Toil' 

3.5t 


1011' 


Furthermore, by Lemma 29 we know that when 

that is too small and the algorithm terminates. Hence, we have 

^ , n , , n , Ore 3.5t 

E [X^in(r,t)|r > t] < relog( —) + — - 

The proof of Lemma 21 shows that the function value does not change by more than 1 in one 
iteration by changing x and can change by at most mce log(^^^) by changing r. Since by Lemma 22 
we know that m < 1 + ^ and Ce = 6 in(i 7 ni;/£) ’ have that pg(x) < relog(^) + ^ throughout 
the execution of the algorithm. Therefore, we have 


E [Xmin(r,t)l'r < t] < Er<tPg(f(t) < nlog{ -) + 

C-af- 

Therefore, (6.22) shows that 


7re 

Ca ' 


-n{\og{nR) + 100) < relog ( ^ 


Hence, we have 


3.5t 


( 71 \ TtL 

— ) H-h re(log(rei?) + 100) 

Cae/ Ca 


< re log 
= re log 


Rn? \ 7n 

-) H-h lOOre 

CaC J Ca 

Rn^ 


CaC 


+ 8 • 10^°re. 


Thus, we have 


P(t <t)=p>l — lO^^relog 


n^R 


+ lO^^re ) . 


□ 


Now, we gather all the result as follows: 

Theorem 31 (Our Cutting Plane Method). Let K C be a non-empty set eontained in a box 
of radius R, i.e. K C B^{R). For any e G (0,72) in expected time 0(reSOQ(^/^)(iL) log(rei?/e) + 
re^ log'^(i)(re72/e)) our eutting plane method either outputs x G K or finds a polytope P = {x : 
Ax >b}DK sueh that 
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1. P has 0{n) many constraints (i.e. A G 5 g |RO(n) 


2. Each constraint of P is either an initial constraint from Bao{R) or of the form {a,x) >b — 6 
where {a,x) > b is a normalized hyperplane (i.e. ||a ||2 = Ij returned by the separation oracle 

and 5 = • 


3. 


The polytope P has small width with respect to some direction ai given by one of the con¬ 
straints, i.e. 

max {di,v) — min (ai^ij) < O (neln(R/e)) 
yePnBoo(i?)' ^ j?ePnSoo(P)' ^ ^ 


4 . Furthermore, the algorithm produces a proof of the fact above involving convex combination 
of the constraints, namely, non-negatives t 2 , ...,tQ(^n) o,nd x £ P such that 


(a) ||x||2 < 3^/nR, 


(b) 


+ Z] 


0(n) 

i=2 


tiOi 


2 


O {j^y/n\og{R/e)), 


(c) ci{x — bi < e, 

(d) {Y.?= 2 ^ ® ^ibi < 0{nelog{R/e)) 


Proof. Our algorithm either finds x G A or we have Si{x^^'^) < e. When Si{x^^^) < e, we apply 
Lemma 28 to construct the polytope P and the linear combination Y2f=2^ tidi. 

Notice that each iteration of our algorithm needs to solve constant number of linear systems 
and implements the sampling step to hnd A*'*') G IR"' s.t. E[A(^)] = '0(x Theorem 33 

shows how to do the sampling in 0(1) many linear systems. Hence, in total, each iterations needs 
to solve 0(1) many linear systems plus nearly linear work. To output the proof for (4), we use 
Lemma 28. 

Note that the linear systems the whole algorithm need to solve is of the form 


(A^S-2A + AI)-if=y. 


where the matrix A^ 
matrix 


-r-T^-r- 


S 3 , A + AI can be written as A DA for the matrix A = [A I] and diagonal 


D = 


S-2 0 

0 AI 


Note that Lemma 14 shows that || (S*^0) ^ ^ for the and {k + 1)*^ linear 

systems we solved in the algorithm. Hence, we have || ^ In [76], 

they showed how to solve such sequence of systems in O(n^) amortized cost. Moreover, since our 
algorithm always changes the constraints by 6 amount where 5 = H(;^) an inexact separation 
oracle S 0 f 2 (e/^) suffices, (see Def 1). Consequently, the total work 0 (nS 0 f 2 (e/^)(A) log(ni2/e) + 
log^*'^^(ni?/e)). Note that as the running time holds with only constant probability, we can 
restart the algorithm whenever the running time is too large. 

To prove (2), we note that from the algorithm description, we know the constraints are either 
from Boo {R) or of the form > d"^x^^^ — 5 where 


5 


ar(A^STl,A + AI)-L 
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From the proof of Lemma 29, we know that if Amax(A^S 2 .^A + AI) > then there is Sj < e. 
Hence, we have Amin((A^S“^A + AI)“^) < Since a is a unit vector, we have 

/ a^(A^S-f,)A + AI)-ia ^ 

V Ca ~ nCa 

□ 


7 Technical Tools 

In this section we provide stand-alone technical tools we use in our cutting plane method in Sec¬ 
tion 6. In Section 7.1 we show how to efficiently compute accurate estimates of changes in leverage 
scores using access to a linear system solver. In Section 7.2 we study what we call the “Stochastic 
Chasing 0 Game” and show how to maintain that a vector is small in norm by making small 
coordinate updates while the vector changes randomly in £ 2 . 


7.1 Estimating Changes in Leverage Scores 

In previous sections, we needed to compute leverage scores accurately and efficiently for use in our 
cutting plane method. Note that the leverage score definition we used was 

= iJVWA (A^WA AI)"^ A^Vwii 

for some A > 0 which is different from the standard definition 

a{w)i = iJVWA (A^WA)^^ A'^Vwii. 

However, note that the matrix A^WA -|- AI can be written as A DA for the matrix A = [A I] 
and diagonal matrix 

' W 0 
0 AI 

and therefore computing ip is essentially strictly easier than computing typical leverage scores. 
Consequently, we use the standard definition a to simplify notation. 

In [99], Spielman and Srivastava observed that leverage scores can be written as the norm of 
certain vectors 


D = 


a{w)i = 


Vwa(a'^wa) ^ A^Vwi. 


and therefore leverage scores can be approximated efficiently using dimension reduction. Unfortu¬ 
nately, the error incurred by this approximation is too large to use inside the cutting point method. 
In this section, we show how to efficiently approximate the change of leverage score more accurately. 

In particular, we show how to approximate a{w) — cr{v) for any given w,v with || log{w) — 
log(i ;)||2 'C 1. Our algorithm breaks (j{w)i — a{v)i into the sum of the norm of small vectors and 
then uses the Johnson-Lindenstrauss dimension reduction to approximate the norm of each vector 
separately. Our algorithm makes use of the following version of Johnson-Lindenstrauss. 

Lemma 32 ([!]). Let 0 < e < ^ and let xi,...,Xrn £ K"' be arbitrary m points. For k = 
0(e“^ log(m)) let Q be a k X n random matrix with each entry sampled from 

formly and independently, 
we have that for all i G [m] 


2 II 112 

formly and independently. Then, E ||Qxj|| = xd for all i G [m] and with high probability in m 


(1 - e)||fi||^ < IIQxdl^ < (1 + e)|| 


Xi\ 
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Algorithm 3: h = LeverageChange(A, u, u;, e, a) 

Input: A G v,w £ IR™q, e G (0,0.5). 

Given: ||V“^(i; — ui )||2 < 3 ^ and A^VA and A^WA are invertible. 

Sample G ^iog(m))xn g^g Lemma 32. 

Let di = ||Qrf\/WA (A^WA) ^ A'^ljH^ for all i £[n]. 

Let t = O (log(e“^)). 

Sample Q/ G ^iog(mt))xn gg ij^Yoma, 32. 

Pick positive integer u randomly such that Pr[M = i] = (^)*. 
for j G {1, 2, • • • , t} U {t + tt} do 
if j IS even then 

Let = ||Q/\/VA (A^VA)”^ (^A^ (V - W) A (A^VA)"^) " 
else 

Let A+ = (V - W)+, i.e. the matrix V - W with negative entries set to 0 . 
Let A- = (W - V)+, i.e. the matrix W - V with negative entries set to 0 . 
Let = ||Q/\/^A(A^VA)-i(A^ (V - W) A (A^VA)”^)^ A^liH^. 
Let = ||Q/\/A^A(A^VA)-i(A^(V-W)A(A^VA)-i)^A^li|| 2 . 

Let 

end 

end 

Let/, = 2 “i'(*+“) + X:^F- 

Output: hi = {wi — Vi)di + Vifi. for all i G [m] 


Theorem 33. Let A G and v,w £ R^q be such that a ||V ^{v — wi )||2 < 3 ^ OLnd both 

A'^VA and A^WA are invertible. For any e £ (0,0.5), Algorithm 3 generates a random variable h 
such that E/i = a{w) — (t{v) and with high probability in m, we have \\h — {cr{w) — cr{v)) II 2 < O (ae). 
Furthermore, the expected running time is 0((nnz(A) +LO)/e^) where LO is the amount of time 
needed to apply (A'^VA) and (A'^WA) to a vector. 

- "(?) ^(j) ^(j) 

Proof. First we bound the running time. To compute di, f /3y\ we simply perform matrix 

multiplications from the left and then consider the dot products with each of the rows of A. Naively 
this would take time 0{{t + tt)^ log(mt)(nnz(A) + LO)). However, we can reuse the computation 
in computing high powers of j to only take time 0{{t + u) log(mt)(nnz(A) + LO)). Now since E[r(] 
is constant we see that the total running time is as desired. It only remains to prove the desired 
properties of h. 

First we note that we can re-write leverage score differences using 


a{w)i - a{v)i = (wi - Vi) 


A(A^ WA) 


+ Vi 


A ((A^WA) ^ - (A^VA) M A^ 


-1 


Consequently, for all i £ [m ], if we let 


di 

h 


= ifA(A^WA) ^A^li, 


ifA 


(A^WA) ^ - (A^VA) ^ 


A^li. 


a{w)i - a{v)i = {wi - Vi)di + {vi)fi 


then 
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We show that di approximates d and /j approximate / well enough to satisfy the statements in the 
Theorem. 

First we bound the quality of di. Note that di = ||-v/WA (A^WA) A^ljH^. Consequently, 
Lemma 32 shows that E[dj] = di and that with high probability in m we have (1—e)dj < di < (l+e)dj 
for all i G [m]. Therefore, with high probability in m, we have 


(w - v)d- {w - v)d\\l= ^ {wi - Vif (^di - d^ - “ Vifd: 


2^2 
i 


= e 


iS|m 
,2 


is m 


{Wi - Vif 


is m 


a{w)i 


Wi 


< 


2'" E 

is [ml 


Wi - Vi 


Next we show how to estimate /. Let X (A^VA) A"^ (V — W) A (A'^VA) By the 
assumption on a we know — -< V — W V and therefore -2l-< X -< jl- Consequently we 
have that 

(A'^WA)= (A'^VA)(I - X)"^ (A^VA) 

OO 

= Y (A^VA) X^' (A^VA) . 

j=0 

and therefore 

fi = lfA (A^VA)“^^^ X^' (A^VA)“^^^ - (A^VA)^^ j A^l* 

OO 

= Y1 ^ ( A"^ VA) X^' (A^ VA) A^li . 

i=i 


Furthermore, using the dehnition of X we have that for even j 



X2 (A'^VA) A^li 

(A^VA)”^^^ (pj (V - W) A (A'^VA)"^) " A^l* 


VVA {A^YA) ^ (^A^ (V - W) A (A'^VA) " A'^I* 


2 

2 


For odd j, using our definition of A+ and A we have that 

= if A (A'^VA) X^' (A^VA) A^li 

= if A ( A'^VA) A'^ (V - W) a) ^ (A^ VA) A^ (V - W) 

X A (A^VA)“^ ( a ^ (V - W) a (A^VA)”^) ~ A^l* 
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where 


ct- = 

jj(3) 


\/A+A (A'^VA) ^ (W - V) A (A'^VA) " A^l^ 

\/A^A (A'^VA)'^ {a^ (W - V) a (A'^VA)"^) ^ A^li 


2 

2 

2 

2 


Consequently, by Lemma 32 and the construction, we see that 


E/i = E 


^ ^ ou 

f(i) _L 

/ ^ ^ 2^ ^ ^ 
j=\ u=l 



h 


and therefore E/i = a{w) — <j{v) as desired. All that remains is to bound the variance of /j. 

To bound the variance of /, let |X| = (A^VA) A^ | W — V| A (A^VA) Note that 
— \l < — |X| ^ X ^ |X| ^ \l and consequently for all j 


g(j) drf £T^ (A^VA) |XP' (A^VA) A^l 


>-1/2 


-1/2 aT; 


1 


<^lfA(A^VA) ^^^|X|(A^VA) 


-1/2 aT. 


4? 

def 1 


V. 




where = \/VA (A^VA) A'^Vy and A IS 3) dlQj^OriBjl ITlSjijllX W^lth 
that 0 ^ P-f; ^ I, we have that for all j 


. Using 


i=l 


(4J-1)2 ^ =TV(P,AP^P,AP,) 

i=l 

< Tr(P^AAP^) = Tr(AP^P^A) 

m 

< Tr (A^) = 1 ) < a" 


2=1 


and thus | 

2 S 


(i) 

< gl and 

d“ 


Aol 


\YfU)_Yp^\\l = Y 


2 ( i{j) Aj) 


< 


< 


I yji fi 

i 

2^Y.vA{A>)\ 


V? ( a.«> - o.«'L + 2 T „.q A - nf '' 


U)V^ , fad)'"- 


2^2 
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Putting this all together we have that 

t OO 

II V/ - Yf\ < ||2“V/(*+“) + 

i=i i=i 

t OO 

< 2“||V/(*+“)||2 + \\Yp^ - Yf^'>\\^ + Y l|W"^^'^ll2 

j=i j=t+i 

< 2^ I \/2ae ^ 4a 

— 4t+M Z—^ 4i—1 Z^ 4i 
j=i j=t+i 

Consequently, since t = 0(log(e“^)) we have the desired result. □ 

7.2 The Stochastic Chasing 0 Game 

To avoid computing leverage scores exactly, in Section 7.1 we showed how to estimate the difference 
of leverage scores and use these to update the leverage scores. However, if we only applied this 
technique the error of leverage scores would accumulate in the algorithm and we need to fix it. 
Naturally, one may wish to use dimension reduction to compute a multiplicative approximation to 
the leverage scores and update our computed value if the error is too large. However, this strategy 
would fail if there are too many rows with inaccurate leverage scores in the same iteration. In 
this case, we would change the central point too much that we are not able to recover. In this 
section, we present this update problem in a general form that we call Stochastic Chasing 0 game 
and provide an effective strategy for playing this game. 

The Stochastic chasing 0 game is as follows. There is a player, a stochastic adversary, and a 
point X G IR™'. The goal of the player is to keep the point close to 0 G R™' in norm and the goal 
of the stochastic adversary is to move x away from 0. The game proceeds for an infinite number 
of iterations where in each iteration the stochastic adversary moves the current point G R™ to 
some new point x^^'^ + G R™' and the player needs to respond. The stochastic adversary cannot 
move the arbitrarily, instead he is only allowed to choose a probability distribution and 
sample A^^^ from it. Furthermore, it is required that = 0 and HAjl^ < c for some fixed c 

and all A G The player does not know or the distribution or the move A^^^ of the 

stochastic adversary. All the player knows is some G R” that is close to x^^^ in ioo norm. With 
this information, the player is allowed to choose one coordinate i and set x- ' to be zero and for 
other j, we have 

The question we would like to address is, what strategy the player should choose to keep x^^'^ 
close to 0 in i^o norm? We show that there is a trivial strategy that performs well: simply pick the 
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largest coordinate and set it to 0. 

Algorithm 4: Stochastic chasing 0 game 
Constant: c > 0, i? > 0. 

Let = 0 E R™. 
for k = 1 to oo do 

Stochastic Adversary: Pick such that E^(fc)A = 0 and HAjj^ < c all A E 
Stochastic Adversary: Pick E R™' such that < R- 

Player: Pick a coordinate using only 
Sample A*^^) from 

Set = 0 and for all j / i. 

end 


Theorem 34. Using the strategy = argmaxj 


,(k) 


, with probability at least 1 — p, we have 


11®*-^^ 11^ < 2(c + R) log {Amk^/p) 
for all k in the Stoehastie Chasing 0 Game. 

Proof. Consider the potential function <h(x) = where a is to be determined. 

Now for all x we know that < 1 + x + and therefore for all |(5| < c, x and a, we have 


^ax+aS < ^ ^ 1 ^2^2gax+|a|c 


Consequently, 


+ A) < I E - E A, 


i is m 


is m 


. CK o A 1^ 
H-e II llooR 

2 


AevW 1 X] + X] ® 

i£[m] i£[m 


(fe) . O -axf"'’ /\2 


Since E^(fc) A = 0 and A „ A c, we have E 


Ae©('=) (Si Ai - e A*) = 0 and 


E 


^ ' A^ + ^ e ^ A^ ] < ' + maxe 




Jfc) 


A&VW 


(fc) 


V * 


r> / (k) (k) 

< ( max + max e“"^» 

\ i i 


Letting = max ^maxj ^, maxj e , we then have 

E^g^(,)$(xW + A) < $(x('=)) + 

Since = argmaxj y^^^ and A R-, the player setting = 0 decreases 4* by 

at least Hence, we have 


43 






Picking a = 2[(Xr) ’ have +R))'^ < 1 and hence a^e“‘^c^ < e "(^+'=). Therefore, we 

have that 

< ... < = 2m. 

Consequently, by Markov’s inequality we have that Pr[<l>(T*^^)) > A^] < ^ for any A^. Furthermore, 
since clearly $(x) > we have that Pr[||x*^^) ||oo > log(Afc)/a] < ^ for all k. Choosing 

Afc = and taking a union bound over all k, we have that 

IIj^ < 2(c + R) log [Amk"^/p) 

for all k with probability at least 


i-E 


2m 

Afc 




□ 
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Part II 


A User’s Guide to Cutting Plane Methods 

8 Introduction 

Cutting plane methods have long been employed to obtain polynomial time algorithms for solving 
optimization problems. However, for many problems cutting plane methods are often regarded 
as inefficient both in theory and in practice. Here, in Part H we provide several techniques for 
applying cutting plane methods efficiently. Moreover, we illustrate the efficacy and versatility of 
these techniques by applying them to achieve improved running times for solving multiple problems 
including semidefinite programming, matroid intersection, and submodular flow. 

We hope these results revive interest in ellipsoid and cutting plane methods. We believe these 
results demonstrate how cutting plan methods are often useful not just for showing that a problem 
is solvable in polynomial time, but in many yield substantial running time improvements. We stress 
that while some results in Part H are problem-specific, the techniques introduced here are quite 
general and are applicable to a wide range of problems. 

In the remainder of this introduction we survey the key techniques we use to apply our cutting 
plane method (Section 8.1) and the key results we obtain on improving the running time for solving 
various optimization problems (Section 8.2). We conclude in Section 8.3 by providing an overview 
of where to find additional technical result in Part H. 

8.1 Techniques 

Although cutting plane methods are typically introduced as algorithms for finding a point in a 
convex set (as we did with the feasibility problem in Part I), this is often not the easiest way 
to apply the methods. Moreover, improperly applying results on the feasibility problem to solve 
convex optimization problems can lead to vastly sub-optimal running times. Our central goal, here, 
in Part H is to provide tools that allow cutting plane methods to be efficiently applied to solve 
complex optimization problems. Some of these tools are new and some are extensions of previously 
known techniques. Here we briefly survey the techniques we cover in Section 10 and Section 11. 

Technique 0: From Feasibility to Optimization 

In Section 10.1, we explain how to use our cutting plane method to solve convex optimization 
problems using an approximate subgradient oracle. Our result is based on a result of Nemirovski [85] 
in which he showed how to use a cutting plane method to solve convex optimization problems 
without smoothness assumptions on the function and with minimal assumptions on the size of the 
function’s domain. We generalize his proof to accommodate for an approximate separation oracle, 
an extension which is essential for our applications. We use this result as the starting point for two 
new techniques we discuss below. 

Technique 1: Dimension Reduction through Duality 

In Section 10.2, we discuss how cutting plane methods can be applied to obtain both primal and 
dual solutions to convex optimization problems. Moreover, we show how this can be achieved while 
only applying the cutting plane method in the space, primal or dual, which has a fewer number of 
variables. Thus we show how to use duality to improve the convergence of cutting plane methods 
while still solving the original problem. 
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To illustrate this idea consider the following very simple linear program (LP) 

n 

min > WiXi 
1=1 

where T G IR" and uJ G R”. Although this LP has n variables, it should to be easy to solve purely 
on the grounds that it only has one equality constraint and thus dual linear program is simply 

max y , 

y<Wiii 

i.e. a LP with only one variable. Consequently, we can apply our cutting plane method to solve it 
efficiently. 

However, while this simple example demonstrates how we can use duality to decrease dimen¬ 
sions, it is not always obvious how to recover the optimal primal solution x variable given the 
optimal dual solution y. Indeed, for many problems their dual is significantly simpler than itself 
(primal), so some work is required to show that working in the space suffices to require a primal 
solution. 

One such recent example of this approach proving successful is a recent linear programming 
result [75]. In this result, the authors show how to take advantage of this observation and get a 
faster LP solver and maximum flow algorithm. It is interesting to study how far this technique can 
extend, that is, in what settings can one recover the solution to a more difficult dual problem from 
the solution to its easier primal problem? 

There is in fact another precedent for such an approach. Grotschel, Lovasz and Schrijver[50] 
showed how to obtain the primal solution for linear program by using a cutting plane method to 
solve the linear program exactly. This is based on the observation that cutting plane methods are 
able to find the active constraints of the optimal solution and hence one can take dual of the linear 
program to get the dual solution. This idea was further extended in [69] which also observed that 
cutting plane methods are incrementally building up a LP relaxation of the optimization problem. 
Hence, one can find a dual solution by taking the dual of that relaxation. 

In Section 10.2, we provide a fairly general technique to recover a dual optimal solution from 
an approximately optimal primal solution. Unfortunately, the performance of this technique seems 
quite problem-dependent. We therefore only analyze this technique for semidefinite programming 
(SDP), a classic and popular convex optimization problem. As a result, we obtain a faster SDP 
solver in both the primal and dual formulations of the problem. 

Technique 2: Using Optimization Oracles Directly 

In the seminal works of Grotschel, Lovasz, Schrijver and independently Karp and Papadimitriou 
[49, 64], they showed the equivalence between optimization oracles and separation oracles, and gave 
a general method to construct a separation oracle for a convex set given an optimization oracle for 
that set, that is an oracle for minimizing linear functionals over the set. This seminal result led 
to the first weakly polynomial time algorithm for many algorithms such as submodular function 
minimization. Since then, this idea has been used extensively in various settings [62, 16, 17, 23]. 

Unfortunately, while this equivalence of separation and optimization is a beautiful and powerful 
tool for polynomial time solvability of problems, in many case it may lead to inefficient algorithms. 
In order to use this reduction to get a separation oracle, the optimization oracle may need to 
be called multiple times - essentially the number of times needed to run a cutting plane method 
and hence may be detrimental to obtaining small asymptotic running times. Therefore, it is an 
interesting question of whether there is a way of using an optimization oracle more directly. 
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In Section 11 we provide a partial answer to this question for the case of a broad class of 
problems, that we call the intersection problem. For these problems we demonstrate how to achieve 
running time improvements by using optimization oracles directly. The problem we consider is 
as follows. We wish to solve the problem for some cost vector c G and convex set K. We 
assume that the convex set K can be decomposed as K = Ki K 2 such that ma'x.^^Ki (c, x) and 
max^g ^2 (c, x) can each be solved efficiently. Our goal is to obtain a running time for this problem 
comparable to that of minimizing K given only a separation oracle for it. 

We show that by considering a carefully regularized variant, we obtain a problem such that 
optimization oracles for Ki and K 2 immediately yield a separation oracle for this regularized 
problem. By analyzing the regularizer and bounding the domains of the problem we are able to 
show that this allows us to efficiently compute highly accurate solutions to the intersection problem 
by applying our cutting plane method once. In other words, we do not need to use a complicated 
iterative scheme or directly invoke the equivalence between separation and optimization and thereby 
save 0(poly(n)) factors in our running times. 

We note that this intersection problem can be viewed as a generalization of the matroid in¬ 
tersection problem and in Section 11.2, we show our reduction gives a faster algorithm in certain 
parameter regimes. As another example, in Section 11.3 we show our reduction gives a substantial 
polynomial improvement for the submodular flow problem. Furthermore, in Section 11.4 we show 
how our techniques allow us to minimize a linear function over the intersection of a convex set and 
an affine subspace in a number of iterations that depends only on the co-dimension of the affine 
space. 

8.2 Applications 

Our main goal in Part II is to provide general techniques for efficiently using cutting plane methods 
for various problems. Hence, in Part II we use minimally problem-specific techniques to achieve 
the best possible running time. However, we also demonstrate the efficacy of our approach by 
showing how techniques improve upon the previous best known running times for solve several 
classic problems in combinatorial and continuous optimization. Here we provide a brief overview 
of these applications, previous work on these problems, and our results. 

In order to avoid deviating from our main discussion, our coverage of previous methods and 
techniques is brief. Given the large body of prior works on SDP, matroid intersection and submod¬ 
ular flow, it would be impossible to have an in-depth discussion on all of them. Therefore, this 
section focuses on running time comparisons and explanations of relevant preivous techniques. 

Semidefinite Programming 

In Section 10.2 we consider the classic semidefinite programming (SDP) problem: 

n 

maxC • X s.t. Ai»lK = bi (primal) mmP"y s.t. ^ C (dual) 

X^O y ^ ^ 

z=l 

where X, C, Aj are m x m symmetric matrices, b,y ^ R"', and A • B Tr(A^B). For many 
problems, n <C m? and hence the dual problem has fewer variables than the primal. There are 
many results and applications of SDP; see [106, 101, 83] for a survey on this topic. Since our focus 
is on polynomial time algorithms, we do not discuss pseudo-polynomial algorithms such as the 
spectral bundle method [51], multiplicative weight update methods [8, 9, 61, 3], etc. 
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Authors 

Years 

Running times 

Nesterov, Nemirovsky[89] 

1992 

0{y/n{nm^ + 

Anstreicher [7] 

2000 


Krishnan, Mitchell [70] 

2003 

0{m{n^ + -|- S)) (dual SDP) 

This paper 

2015 

0{m{v?‘ -|- + S)) 


Table 8: Previous algorithms for solving a n x n SDP with m constraints and S non-zeros entries 


Currently, there are two competing approaches for solving SDP problems, namely interior point 
methods (IPM) and cutting plane methods. Typically, IPMs require fewer iterations than the cut¬ 
ting plane methods, however each iteration of these methods is more complicated and possibly more 
computationally expensive. For SDP problems, interior point methods require the computations of 
the Hessian of the function — logdet (C — Y17=i whereas cutting plane methods usually only 

need to compute minimum eigenvectors of the slack matrix C — ^11=1 

In [7], Anstreicher provided the current fastest IPM for solving the dual SDP problem using 
a method based on the volumetric barrier function. This method takes 0((mn)^/^) iterations 
and each iteration is as cheap as usual IPMs. For general matrices C,X, Aj, each iteration takes 
0{nm‘^ time where u is the fast matrix multiplication exponent. If the constraint 

matrices Aj are rank one matrices, the iteration cost can be improved to 0{m‘^ + nm? + n^m) 
[71]. If the matrices are sparse, then [40, 84] show how to use matrix completion inside the IPM. 
However, the running time depends on the extended sparsity patterns which can be much larger 
than the total number of non-zeros. 

In [70], Krishnan and Mitchell observed that the separation oracle for dual SDP takes only 
0{m^ + S) time, where S = nnz(Ai) be the total number of non-zeros in the constant 

matrix. Hence, the cutting plane method by [105] gives a faster algorithm for SDP for many 
regimes. For u = 2.38, the cutting plane method is faster when A* is not rank 1 and the problem is 
not too dense, i.e. Y^=i While there are previous methods for using cutting 

plane methods to obtain primal solutions [69] , to the best of our knowledge, there are no worst 
case running time analysis for these techniques. 

In Section 10.2, show how to alleviate this issue. We provide an improved algorithm for finding 
the dual solution and prove carefully how to obtain a comparable primal solution as well. See 
Figure 9.1 for a summary of the algorithms for SDP and their running times. 

Matroid Intersection 

In Section 11.2 we show how our optimization oracle technique can be used to improve upon the 
previous best known running times for matroid intersection. Matroid intersection is one of the most 
fundamental problems in combinatorial optimization. The first algorithm for matroid intersection 
is due to the seminal paper by Edmonds [26]. In Figures 9.2 and 9.3 we provide a summary 
of the previous algorithms for unweighted and weighted matroid intersection as well as the new 
running times we obtain in this paper. While there is no total ordering on the running times of 
these algorithms due to the different dependence on various parameters, we would like to point 
out that our algorithms outperform the previous ones in regimes where r is close to n and/or the 
oracle query costs are relatively expensive. In particular, in terms of oracle query complexity our 
algorithms are the hrst to achieve the quadratic bounds of O(n^) and 0{nr) for independence and 
rank oracles. We hope our work will revive the interest in the problem of which progress has been 
mostly stagnated for the past 20-30 years. 
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Authors 

Years 

Running times 

Edmonds [26] 

1968 

not stated 

Aigner, Dowling [2] 

1971 

0{nr^Tind) 

Tomizawa, Iri [102] 

1974 

not stated 

Lawler [72] 

1975 

0{nr‘^Tind) 

Edmonds [28] 

1979 

not stated 

Cunningham [21] 

1986 

0{nr^-^Tind) 

This paper 

2015 

0{'n? log uTind + 'n? log'^i^i n) 
0{nr log^ nTrank + log'^i^i n) 


Table 9: Previous algorithms for (unweighted) matroid intersection. Here n is the size of the ground 
set, r = max{ri,r 2 } is the maximum rank of the two matroids, Tlnd is the time needed to check if 
a set is independent (independence oracle), and is the time needed to compute the rank of a 
given set (rank oracle). 


Authors 

Years 

Running times 

Edmonds [26] 

1968 

not stated 

Tomizawa, Iri [102] 

1974 

not stated 

Lawler [72] 

1975 

0{nr‘^Tind + nr^) 

Edmonds [28] 

1979 

not stated 

Frank [33] 

1981 

0{n‘^r{Tcircmt + n)) 

Orlin, Ahuja [91] 

1983 

not stated 

Brezovec, Cornuejols, Glover [14] 

1986 

0{nr{Tcircmt + r + logn)) 

Fujishige, Zhang [39] 

1995 

O^ri^^d.b \Qgj-M ■ Tind) 

Shigeno, Iwata [96] 

1995 

0((n + Tcircxdt)nr^-^ log rM) 

This paper 

2015 

0{{v?‘ log nTInd + log'^’-^i n) log nM) 

0{{nr log^ nT^ank + log^i^i n) log nM) 


Table 10: Previous algorithms for weighted matroid intersection. In additions to the notations used 
in the unweighted table, 7(;ircuit is the time needed to find a fundamental circuit and M is the bit 
complexity of the weights. 
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Minimum-Cost Submodular Flow 

In Section 11.3 we show how our optimization oracle technique can be used to improve upon the 
previous best known running times for (Minimum-cost) Submodular Flow. Submodular flow is 
a very general problem in combinatorial optimization which generalizes many problems such as 
minimum cost flow, the graph orientation, polymatroid intersection, directed cut covering [37]. In 
Figure 9.4 we provide an overview of the previous algorithms for submodular flow as well as the 
new running times we obtain in this paper. 

Many of the running times are in terms of a parameter h, which is the time required for com¬ 
puting an “exchange capacity”. To the best of our knowledge, the most efficient way of computing 
an exchange capacity is to solve an instance of submodular minimization which previously took 
time 0(n‘^E0 -|- n®) (and now takes 0(n^E0 -|- n^) time using our result in Part III). Readers may 
wish to substitute h = 0(n^E0 -|- n^) when reading the table. 

The previous fastest weakly polynomial algorithms for submodular flow are by [59, 30, 32], which 
take time 0(n®E0 -|- n^) and 0{mn^\ognU ■ EO), assuming h = 0(n^E0 -|- n^). Our algorithm 
for submodular flow has a running time of 0(n^E0 -|- n^), which is significantly faster by roughly 
a factor of O(n^). 

Eor strongly polynomial algorithms, our results do not yield a speedup but we remark that 
our faster strongly polynomial algorithm for submodular minimization in Part III improves the 
previous algorithms by a factor of O(n^) as a corollary (because h requires solving an instance of 
submodular minimization). 

8.3 Overview 

After providing covering some preliminaries on convex analysis in Section 9 we split the remain¬ 
der of Part II into Section 10 and Section 11. In Section 10 we cover our algorithm for convex 
optimization using an approximate subgradient oracle (Section 10.1) as well as our technique on 
using duality to decrease dimensions and improve the running time of semidefinite programming 
(Section 10.2). In Section 11 we provide our technique for using minimization oracles to minimize 
functions over the intersection of convex sets and provide several applications including matroid 
intersection (Section 11.2), submodular flow (Section 11.3), and minimizing a linear function over 
the intersection of an affine subspace and a convex set (Section 11.4). 

9 Preliminaries 

In this section we review basic facts about convex functions that we use throughout Part II. We also 
introduce two oracles that we use throughout Part II, i.e. subgradient and optimization oracles, and 
provide some basic reductions between them. Note that we have slightly extended some definitions 
and facts to accommodate for the noisy separation oracles used in this paper. 

Eirst we recall the definition of strong convexity 

Definition 35 (Strong Convexity ). A real valued function / on a convex set 12 is a-strongly 
convex if for any x,y G Q and t G [0,1], we have 

f{tx + {l -t)y) + - t)\\x - y\\^ < tf{x) + (1 -t)/(y). 

Next we define an approximate subgradient. 
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Authors 

Years 

Running times 

Fujishige [35] 

1978 

not stated 

Grotschel, Lovasz, Schrijver[49] 

1981 

weakly polynomial 

Zimmermann [113] 

1982 

not stated 

Barahona, Cunningham [12] 

1984 

not stated 

Cunningham, Frank [22] 

1985 

^ 0(n4/ilog C) 

Fnjishige [36] 

1987 

not stated 

Frank, Tardos [34] 

1987 

strongly polynomial 

Cui, Fujishige [108] 

1988 

not stated 

Fujishige, Rock, Zimmermann [38] 

1989 

—7- 0(n^/ilog n) 

Chung, Tcha [18] 

1991 

not stated 

Zimmermann [114] 

1992 

not stated 

McCormick, Ervolina [82] 

1993 

0{rJh* log nCU) 

Wallacher, Zimmermann [109] 

1994 

0{n^h\ognCU) 

Iwata [52] 

1997 

0{n^h\og U) 

Iwata, McCormick, Shigeno [57] 

1998 

0 (n^/imin jlogreC, ti? logn}) 

Iwata, McCormick, Shigeno [58] 

1999 

0 (re^/imin jlognt/, log n|) 

Eleischer, Iwata, McCormick[32] 

1999 

0 [n^h min jlog U, log n|) 

Iwata, McCormick, Shigeno [59] 

1999 

O [n^h min {log C, log n|) 

Fleischer, Iwata [30] 

2000 

0{mn^ log nlJ ■ EO) 

This paper 

2015 

0{n^ log nCU ■ EO -|- log*^*^^^ nCU) 


Figure 8.1: Previous algorithms for Submodular Flow with n vertices, maximum cost C and max¬ 
imum capacity U. The factor h is the time for an exchange capacity oracle, h* is the time for 
a “more complicated exchange capacity oracle” and EO is the time for evalnation oracle of the 
submodnlar function. The arrow,—>■, indicates that it nsed currently best maximum submodular 
flow algorithm as subroutine which was non-existent at the time of the publication. 
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Definition 36 (Subgradient). For any convex function / on a convex set ri, the 5-subgradients of 
f Sit X are defined to be 

dsf{x) = {gen : f{y) + S> f{x) + {g,y-x) for all y G fi}. 


Here we provide some basic facts regarding convexity and subgradients. These statements are 
natural extensions of well known facts regarding convex functions and their proof can be found in 
any standard textbook on convex optimization. 

Fact 37. For any convex set n and x be a point in the interior of fl, we have the following: 

1. If f is convex on Q, then dof{x) / 0 and dsf{x) C dtf{x) for all 0 < s < t. Otherwise, we 

have ll^llg > have 5 > {g,y — x) and hence 

2. If f is a differential convex function on Q, then V/(x) G dof{x). 

3. If fi and /2 are convex function on 17, gi G ds^fi{x) and g 2 G ds^fiix), then agi + [3g2 G 

da5^+p5^{gi +g 2 ){x). 

f. If f is a-strongly convex on 17 with minimizer x*, then for any y with f{y) < f{x*) -|- e, we 
have — y|| < e. 

Next we provide a reduction from subgradients to separation oracles. We will use this reduction 
several times in Part II to simplify our construction of separation oracles. 

Lemma 38. Let f be a convex function. Suppose we have x and g G dsf{x) with ||x ||2 < 1 < D 
and 5 < 1. If then f{x) < min||j^ 2 || 2 <i? fiv) + 2\/ 5D and if II 5 II 2 < \\[^ then 

{||y ||2 < D : f{y) < f{x)] C {y : (f'ff < (Ff -h 2^/61)] 

with d = ^/ll^ll^. Hence, this gives a (2\/ dD, 2y/5D)-separation oracle on the set {||F ||2 < D}. 
Proof. Let y such that ||y ||2 < L). By the definition of 5-subgradient, we have 

/(d) + 5 > f{x) + {g,y-x). 

If ||d|| < h\[^i then, we have \{g,y — x)\ < \/ 5D because ||x|| < D and ||d ||2 < H). Therefore, 

min /(y) -|- 2'/5D > f{x). 

IklL-^ 

Otherwise, we have ||y ||2 > Foi" ^'^3 fiv) ^ fix): we have 5 > {g,y — x) and hence 


2 V6D > ( 77 ^,y — X 


□ 


At several times in Part II we will wish to construct subgradient oracles or separation oracles 
given only the ability to approximately maximize a linear function over a convex set. In the 
remainder of this section we formally define such a optimization oracle and prove this equivalence. 
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Definition 39 (Optimization Oracle). Given a convex set K and (5 > 0 a ^-optimization oracle for 
K \s a function on IR"' such that for any input c G R”, it outputs y such that 

max (c, x) < {c,y) + 5. 

x£K 

We denote by OOs{K) the time complexity of this oracle. 

Lemma 40. Given a convex set K, any e-optimization oracle for K is a e-subgradient oracle for 
/(c) = max^g/f (c, x) . 

Proof. Let Xc be the output of e-optimization oracle on the cost vector c. We have 

max (c, x) < (c, Xc) + e. 

x£K 

Hence, for all d, we have and therefore 

(^Xc, d- c^ + /(^ < /(d) + e. 

Hence, Xc G dsf{c). □ 

Combining these lemmas shows that having an e-optimization oracle for a convex set K con¬ 
tained in a ball of radius D yields a 0{VDe,VDe) separation oracle for maxa,g;^ {c,x). We use 
these ideas to construction separation oracles throughout Part H. 

10 Convex Optimization 

In this section we show how to apply our cutting plane method to efficiently solve problems in 
convex optimization. First, in Section 10.1 we show how to use our result to minimize a convex 
function given an approximate subgradient oracle. Then, in Section 10.2 we illustrate how this 
result can be used to obtain both primal and dual solutions for a standard convex optimization 
problems. In particular, we show how our result can be used to obtain improved running times for 
semidefinite programming across a range of parameters. 

10.1 From Feasibility to Optimization 

In this section we consider the following standard optimization problem. We are given a convex 
function / : R"" —>■ R U {-|-oo} and we want to find a point x that approximately solves the 
minimization problem 

min fix) 

given only a subgradient oracle for /. 

Here we show how to apply the cutting plane method from Part I turning the small width 
guarantee of the output of that algorithm into a tool to find an approximate minimizer of /. Our 
result is applicable to any convex optimization problem armed with a separation or subgradient 
oracle. This result will serve as the foundation for many of our applications in Part H. 

Our approach is an adaptation of Nemiroski’s method [85] which applies the cutting plane 
method to solve convex optimiziation problems, with only minimal assumption on the cutting 
plane method. The proof here is a generalization that accommodates for the noisy separation 
oracle used in this paper. In the remainder of this subsection we provide a key definition we will 
use in our algorithm (Defintion 41), provide our main result (Theorem 42), and conclude with a 
brief discussion of this result. 
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Definition 41. For any compact set K, we define the minimum width by MinWidth(iF) '= 
min||;j||2=i m.a^s,y&K {a,x-y). 

Theorem 42. Let f be a convex function on IR"' and Ll be a convex set that contains a minimizer 

of f. Suppose we have a {r], 6)-separation oracle for f and Ll is contained inside Boo{R)- Using 

BooiR) as the initial polytope for our Cutting Plane Method, for any 0 < a < 1, we can compute 
X G such that 

/(f) - min/(y) < r/ +a ( max/(y) - min/(y) ) . (10.1) 

yen \ yen yen j 

with an expected running time of 

O (^nSO^,5(/)log (v)) ’ 

where 5 = 0 ^ ^ ~ MmWidth(o) • Rurthermore, we only need the oracle defined on 

the set Boo{R)- 


Proof. Let f* G argmin^gf^ fix)- Since Bo^iR) G) contains a minimizer of /, by the definition of 
( 77 , 5)-separation oracles, our Cutting Plane Method (Theorem 31) either returns a point f that is 
almost optimal or returns a polytope P of small width. In the former case we have a point f such 
that /(f) < min^/(y) + r]. Hence, the error is clearly at most ry + a (max^g^ /(f) — rnin^jg^ /(f)) 
as desired. Consequently, we assume the latter case. 

Theorem 31 shows MinWidth(P) < C'neln(i2/e) for some universal constant C. Picking 


e 


aMinWidth(H) 

nln(^) 


( 10 . 2 ) 


for small enough constant C", we have MinWidth(P(®)) < Q;MinWidth(H). Let = f* + a(H —f*), 
namely, = {f* + a{z — f*) : f G H}. Then, we have 


MinWidth(H“) = aMinWidth(H) > MinWidth(P). 

Therefore, H" is not a subset of and hence there is some point y G Ul°‘\P. Since C H C 
BooiR), we know that y does not violate any of the constraints of P^^'^ and therefore must violate 
one of the constraints added by querying the separation oracle. Therefore, for some j < i, we have 

Cse/^/n . 

By the definition of (ry, Cse/\/n)-separation oracle (Definition 2), we have f{y) > f{x^d-Uf Since 
y G D“, we have y = (1 — a)x* + az for some f G H. Thus, the convexity of / implies that 


/(y) < (1 - a)fix*) + afiz). 


Therefore, we have 

min /(f^^^) — min/(f) < /(y) — fix*) < a (max/(f) — min/(f) 
l<fc<2 y zGQ 

Hence, we can simply output the best f among all f^-^^ and in either case f satisfies (10.1). 

Note that we need to call (ry, 5)-separation oracle with 6 = Qie/y/n) to ensure we do not cut 
out X* . Theorem 31 shows that the algorithm takes 0(nS0^^5(/) log(nP/e) + log‘^*'^^(nP/e)) 

expected time, as promised. Furthermore, the oracle needs only be defined on BooiR) as our 
cutting plane method guarantees x^^'l G BooiR) foi" ^ (although if needed, an obvious separating 
hyperplane can be returned for a query point outside BooiR) )• 
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Observe that this algorithm requires no information about Q (other than that n C Boo{R)) 
and does not guarantee that the output is in Q. Hence, even though 0 can be complicated to 
describe, the algorithm still gives a guarantee related to the gap max,jgQ f{x) — min^g^ f{x)- For 
specihc applications, it is therefore advantageous to pick a 0 as large as possible while the bound 
on function value is as small as possible. 

Before indulging into specihc applications, we remark on the dependence on k. Using John’s 
ellipsoid, it can be shown that any convex set H can be transformed linearly such that (1) Boo(l) 
contains and, (2) MinWidth(n) = In other words, k can be effectively chosen as 

Therefore if we are able to hnd such a linear transformation, the running time is simply 

O (jiSO{f) log (n/a) + log^^^^ (n/a)^. Often this can be done easily using the structure of the 

particular problem and the running time does not depend on the size of domain at all. 


10.2 Duality and Semidefinite Programming 

In this section we illustrate how our result in Section 10.1 can be used to obtain both primal and 
dual solutions for standard problems in convex optimization. In particular we show how to obtain 
improved running times for semidehnite programming. 

To explain our approach, consider the following minimax problem 

minmax (Ax, ^ + (c, x) + (d, y ) (10.3) 

y£Y xex \ / 

where x G IR™ and y G R”. When n, solving this problem by directly using Part I could lead 
to an inefficient algorithm with running time at least m^. In many situations, for any fixed y, the 
problem max^gx (Ax, y) is very easy and hence one can use it as a separation oracle and apply 
Part I and this would gives a running time almost independent of m. However, this would only 
give us the y variable and it is not clear how to recover x variable from it. 

In this section we show how to alleviate this issue and give semidefinite programming (SDP) 
as a concrete example of how to apply this general technique. We do not write down the general 
version as the running time of the technique seems to be problem specific and faster SDP is already 
an interesting application. 

For the remainder of this section we focus on the semidefinite programming (SDP) problem: 


maxC • X s.t. A,- • X = bi (10.4) 

XbO ^ ^ 


and its dual 

n 

min(?"y s.t. Ui^i ^ C (10.5) 

y ^ 

1=1 

where X, C, Aj are m x m symmetric matrices and b,y G R”. Our approach is partially inspired 
by one of the key ideas of [51, 70]. These results write down the dual SDP in the form 

n 

vainP'y- Amin(Amin( V' y^Ai - C),0) (10.6) 

?/ 


for some 
problem, 
where 


large number K and use non-smooth optimization techniques to solve the dual SDP 
Here, we follow the same approach but instead write it as a max-min problem minjj/x(y) 


/x(y) = max 

TrX<A',X;-0 


+ 


X,C-^y,A, 


i=l 


(10.7) 
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Thus the SDP problem in fact assumes the form (10.3) and many ideas in this section can be 
generalized to the minimax problem (10.3). 

To get a dual solution, we notice that the cutting plane method maintains a subset of the primal 
feasible solution conv(Xj) such that 


y + max ( X, C — 

X£conv(Xi) \ 

Applying minimax theorem, this shows that there exists an approximation solution X in conv(Xj) 
for the primal problem. Hence, we can restrict the primal SDP on the polytope conv(Xj), this 
reduces the primal SDP into a linear program which can be solved very efficiently. This idea of 
getting primal/dual solution from the cutting plane method is quite general and is the main purpose 
of this example. As a by-product, we have a faster SDP solver in both primal and dual! We remark 
that this idea has been used as a heuristic to obtain [69] for getting the primal SDP solution and 
our contribution here is mainly the asymptotic time analysis. 

We first show how to construct the separation oracle for SDP. For that we need to compute 
smallest eigenvector of a matrix. Below, for completeness we provide a folklore result showing we 
can do this using fast matrix multiplication. 



« T~l 

min b y + 


max 

TrX<if,XbO 


X, C - ^ yiAij ~ min P' 


Lemma 43. Given a n x n symmetric matrix Y such that —Kl ^ Y ^ Rl, for any e > 0, 
with high probability in n in time \og^^^\R/e)) we can find a unit vector u such that 

ifYu> Aniax(Y) - e. 

Proof. Let B *= ^Y -|- I. Note that B ^ 0. Now, we consider the repeated squaring Bq = B and 
B? 


Bfc+i — 


TrB 


H. Let 0 < Ai < A 2 < • • • < be the eigenvalues of B and h) be the corresponding 


eigenvectors. Then, it is easy to see the the eigenvalues of B^ are 

♦ def 


E n \ 2^ * 
i=l 

Let g be a random unit vector and r = B^g. Now q = Yl foi" some Oj such that = 1. 

Letting 


P = 


E 


A,>(1-5)A„ 


\ 2'' -> 
OiA: Vi 


E n >2'= 

i=l \ 


we have 


\r-p\\2 = 


EAi<(l-5)A„ 


E n >2'= 

i=l \ 


< 


E 


Ai<(l-<S)A„ \ 

E n 5^2* 

i=l \ 


< (1 — 5)^ n. 


Letting k = log 2 ^ have ||T — PII 2 < dj^Jn. Since 0 ^ B ^ 21, we have 


'J r^Bf > \/fF'Rp — Y (r — p)'^B(r — ff) 

> — 26jy/n. 


Note that p involves only eigenvectors between (1 — 5 )Xn to An. Hence, we have 

^{l- 6 )Xn\\p\\^ - 26 /V^. 
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With constant probability, we have = Q{l/^/n). Hence, we have ||^|2 = H(l/-y/n). Using 
B ■< 21 and ||^|2 ^ ||^il 2 ~ have that so long as 5 is a small enough universal constant 


Ik1l2 “ M\2+^/^ 

= (l-0(5))v^-0(5) 

= ^/K-0{6Vr). 


Therefore, we have 


f^Yr 



> A 


^'max 


(Y) 


0{R5). Hence, we can find vector r by computing k matrix 


multiplications. [24] showed that fast matrix multiplication is stable under Frobenius norm, i.e., 
for any r/ > 0, using 0(log(n/6)) bits, we can find C such that ||C — AB Ilf £ 5 ||a||||b|| in 
time where uj is the matrix multiplicative constant. Hence, this algorithm takes only 

Q(^a;+o(i) iogO(i)(^-i)) xhe result follows from renormalizing the vector r, repeating the 

algorithm O(logre) times to boost the probability and taking 5 = H(e/i?). □ 


The following lemma shows how to compute a separation for defined in (10.7). 

Lemma 44. Suppose that ||Aj||^ < M and ||C||p, < M. For any 0 < e < 1 and y with ||y ||2 = 
0{L), with high probability in m, we can compute a {e,e)-separation of Jk on {||®||2 ^ L} ot y 
in time 0{S + log^^^\nKML/e)) where where S is the sparsity of the problem defined as 

nnz(C) + nnz(Aj). 

Proof. Note that —0{nML)l ^ ^ — Vi-^i ^ 0{nML)l. Using Lemma 43, we can find a vector 
V with ||u ||2 = A in time \og^ML/5)) such that 


C - ^ yiAi 

\ i=l 


V > 


max 

TrX<i<',X^0 


X,C-^yiA,)-5. 


i=l 


( 10 . 8 ) 


In other words, we have a ^-optimization oracle for the function fx- Lemma 40 shows this yields a 
5-subgradient oracle and Lemma 38 then shows this yields a (^0{\/6L), 0(\/^)^-separation oracle 
on the set {||^||2 ^ L}. By picking 6 = e^/L, we have the promised oracle. □ 


With the separation oracle in hand, we are ready to give the algorithm for SDP: 


Theorem 45. Given a primal-dual semidefinite programming problem in the form (10.4) and 
(10.5), suppose that for some M > 1 we have 

1. || 6||2 < M, ||C||^ < M and || Aj||^ < M for all i. 

2. The primal feasible set lies inside the region TrX < M. 

3. The dual feasible set lies inside the region ||'v|| < M. 

Let OPT he the optimum solution of (10.4) and (10.5). Then, with high probability, we can find X 
and y such that 


1. X ^ 0, TrX = 0{M), \bi - (X, A*)] < e for all i andC»X> OPT - e. 
l|y|loo = 0{M), Ya=i hC - el and P'y < OPT -k e. 
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in expected time O y{nS + (^^)j where S is the sparsity of the problem 

defined as nnz(C) + iiiiz(Aj) and oj is the fast matrix multiplication constant. 

Proof. Let K > M he some parameter to be determined. Since the primal feasible set is lies inside 
the region TrX < M < K, we have 


* 11 

min b y 

j/jAihC 


max C • X 

XbO,TrX<ii',Ai»X=fei 


max min C • X — > % (Aj • X — 6j) 

XbO,TrX<A y ^ 


= min max V'y + (C — i/iAi) • X 

y XbO,TrX<A \ ^ 

= min/x(y). 
y 


Lemma 44 shows that it takes SOs^sifx) = 0{S + \og{nKML/5)) time to compute a 

((i, J)-separation oracle of fK for any point y with ||y||oo = 0{L) where L is some parameter with L 
M. Taking the radius R = L, Theorem 42 shows that it takes O (nSOs^sifK) log (^) + log*^*-^! ( 

expected time with 5 = 0 (^an~^^‘^L) to find y such that 


fK{y) - ,, /^(y) < 5 + a rnax /x(y) - ,, min /x(y) < 5 + 2a {nML + 2nKML) 


1/ <L 


y <L 


?7 <L 


Picking a = /^(y) < min^/i^(y) + e. Therefore, 

n 

P^y + K max(Amax(C - ^ yjA*), 0) < OPT + e. 


2=1 


Let /3 = max(Amax(C - yiAj), 0). Then, we have that yiM ^ C - ,01 and 

* I < * 11 

b y > min b y 

E?=i?/«A,bC-/3I 

= max (C - 01) • X 

XbOA,»X=bi 

> OPT - fiM 

because TrX < M. Hence, we have 

n 

OPT - fiM + fiK <Py + K max(A„,ax(C - ^ piAi), 0) < OPT + e 

2=1 

Putting A = M + 1, we have fi < e. Thus, 

n 

ViAi E C - el. 

2=1 

This gives the result for the dual with the running time O [{nS + log^l^l ^ 
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SIS IV 




Our Cutting Plane Method accesses the sub-problem 


max 

XbO,TrX<X 


(C- 

i 


only through the separation oracle. Let zhe the output of our Cutting Plane Method and ^ 

be the matrices used to construct the separation for the 0(n) hyperplanes the algorithm maintains 
at the end. Let u be the maximum eigenvector of C — Now, we consider a realization 

of fK 

fK{y) = + max ( C - V yiAi ) . 

'K.&com(Kuu^,Vivf) \ / 

Since applying our Cutting Plane Method to either Jk or fx gives the same result, the correctness 
of the our Cutting Plane Method shows that 

< min fK{y) + e. 
y <L 

I m 11 oo 

Note that the function fx is defined such that fx{z) = Hence, we have 

min fx{y) < /i^(z) < fx{^ < min fx{y) + e. 

Also, note that fx{x) < fxix) for all x. Hence, we have 


min fx{y) - e < min f{y) < min fx{y)- 
k S’ S' <L 


Now, we consider the primal version of /, namely 


5(X) = min P'y + / X, C - ^ y^Ai \ 


2=1 


Sion’s minimax theorem [98] shows that 

OPT > max = min f{y) > OPT — e. 

'K£com{KuiP' ,Viv[f) N 

Halloo — 

Therefore, to get the primal solution, we only need to find u by Lemma 43 and solve the maxi¬ 
mization problem on g. Note that 

n 

g{X) = min y, " (X, A,)) + (X, C) 

S' I i=l 


-L^|6,-(X,Ai)| + (X,C). 


For notation simplicity, we write KuiF = vqVq . Then, X = some ^ 

aj > 0. Substituting this into the function g, we have 


g{a) = -LJ2 


bi-'^ ctjvjAiVj 


+ UjvJCvj 
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Hence, this can be easily written as a linear program with 0(n) variables and 0(n) constraints in 
time 0(nS). Now, we can apply interior point method to find a such that 

g(a) > max d(^) — e > OPT — 2e. 

X.Gconv(Ku'U^ 

Let the corresponding approximate solution be X = ^ ajVj^. Then, we have 

(X,C)-L^|6,-(X,A,)| >OPT-2e. 

i 

Now, we let hi = ^X, Then, we note that 


X,C 


< max C • X 

= min bj y 


< OPT + M^|6i-<^X,A, 

i 

because u < M. Hence, we have 

OPT + (M-L)^|6i- ^X,Ai^| > (X,C)- 

i 

Now, we put L = M + 2, we have 

J^|6i-(X,A,) <e. 


Tj^|6.-(x,Ai 


> OPT - 2e. 


This gives the result for the primal. Note that it only takes log‘^*'^^(nM/e)) to solve a 

linear program with 0{n) variables and 0{n) constraints because we have an explicit interior point 
deep inside the feasible set, i.e. ^ for some parameter m [76].® Hence, the running time is 

dominated by the cost of cutting plane method which is O (^{nS + n® + log*^^^^ 

by putting L = M + 2. □ 

We leave it as an open problem if it is possible to improve this result by reusing the computation 
in the separation oracle and achieve a running time of O (^{nS + n® + nm^) log*^^^^ • 


11 Intersection of Convex Sets 

In this section we introduce a general technique to optimize a linear function over the intersec¬ 
tion of two convex sets, whenever the linear optimization problem on each of them can be done 
efficiently. At the very high level, this is accomplished by applying cutting plane to a suitably 
regularized version of the problem. In Section II.I we present the technique and in the remaining 
sections we provide several applications including, matroid intersection (Section 11.2), submodular 
flow (Section 11.3), and minimizing over the intersection of an affine subspace and a convex set 
(Section 11.4). 

^Without this, the running time of interior point method depends on the bit complexity of the linear programs. 
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11.1 The Technique 

Throughout this section we consider variants of the following general optimization problem 

max {c,x) (11-1) 

x&Kir\K2 

where x, c G IR”', Ki and K 2 are convex subsets of R”. We assume that 

max llxIL < M, max llxIL < M, llcIL < M (11-2) 

xe-ftTi X&K2 

for some constant M > 1 and we assume that 

Kir\K2^^. (11.3) 

Instead of a separation oracle, we assume that Ki and K 2 each have optimization oracles (see 
Section 9). 

To solve this problem we first introduce a relaxation for the problem (11.1) that we can optimize 
efficiently. Because we have only the optimization oracles for Ki and K 2 , we simply have variables 
X and y for each of them in the objective. Since the output should (approximately) be in the 
intersection of Ki and K 2 , a regularization term ~2IF “ ^112 added to force x ^ y where A is 
a large number to be determined later. Furthermore, we add terms to make the problem strong 
concave. 


Lemma 46. Assume (11.2) and (11.3). For A > 1, let 

f\ix^y) = \{Fx) + ^{c,y)-^\\x-y\\l-^\\x\\l-^\\y\\l . (11.4) 

There is an unique maximizer {x\,y\) for the problem max,jgXi,^eX 2 /a(^) y)- The maximizer 
ixx,y\) is a good approximation of the solution of (11.1), i-e. ||xa ~yA ||2 ^ 

max {c,x) < fx{xx,'yx) + ^. (11.5) 

x&KinK2 A 

Proof. Let x* be a maximizer of m£LX^^KinK 2 x). By assumption (11.2), ||x *||2 < AI, and 
therefore 

Iloilo m2 

fx{x*,x*) = {c,x*) -^ max (c,x)-— . (11.6) 

A xGKinK2 A 

This shows (11.5). Since fx is strongly concave in x and y, there is a unique maximizer {xx,yx)- 
Let OPTx = fxixxiVx)- Then, we have 

OPTa < ^||^|2l|®A||2 + ^N|2l|yA||2-^||^A-yA||2 
m2 m2 a 11 ^ ^,,2 

< ^ + 

On the other hand, using A > 1, (11.6) shows that 


OPTa > /a(x*,x*) > max (c,x)-— > —2M‘^. 

x£Kir]K2 A 

Hence, we have 

,, ^,,2 2(m2-OPTa) 6M2 

\\xx-yx\\2 < 


(11.7) 

□ 
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Now we write max fx{x,y) as a max-min problem. The reason for doing this is that the dual 
approximate solution is much easier to obtain and there is a way to read off a primal approximate 
solution from a dual approximate solution. This is analogous to the idea in [73] which showed how 
to convert a cut solution to a flow solution by adding regularization terms into the problem. 

Lemma 47. Assume (11.2) and (11.3). Let A > 2. For any x G Ki and y G K 2 , the function fx 
can he represented as 

f\{x,y)= _ min gx{x,y,ei,e2,9^) (11.8) 

{91,02,83)^^ 

where Ft = {{9 i,92.,9^)'■ <2M, ||02||2 ^||^3||2 ^ 


Jli-9) 

Let /iA(6*1,6*2, 6*3) = maxs(zK 3 ,^(iK 2 9xix, y, 6*1,6*2, 6*3)• For any {9[, 02, ^s) such that hx{9'i,9'2,9'^) < 
min^^^ /ia( 0 i, 02, 03) + e, we know z = |(02 + ^3) satisfies 

20M‘^ o 

max (c, x) < (c, z) H---h 20A'^e. 

xeKinK2 A 

and ||T—Ta ||2 +||^“yA||2 ^ 4:\/2)ai + where {xx,y\) is the unique maximizer for the problem 

maX;^g^j fx{xi yfi 

Proof. Note that for any < a, we have 

-^11^112= (^"*’0 + ^lKll2 

Inh-" 

Using this and (11.2), we have (11.8) for all x G Ki and y G K 2 as desired. Since Fl is closed 
and bounded set and the function gx is concave in (x, y) and convex in (0i,02,03), Sion’s minimax 
theorem [98] shows that 

max fx{x,y)= min /ia(0i, 02,03) (11.10) 

xeKi,y£K2 (ei,02,6'3)efi 

Since fx is strongly concave, there is an unique maximizer {xx,yx) of fx- Since hx is strongly 
convex, there is a unique minimizer (01,02,03). By the definition of fx and hx, we have 

hx{9lX2,9l) > gx{Sx,yx, 9\, 9^,91) > fx{xx, fix) • 

Using (11.10), the equality above holds and hence (0]', 9^, 03) is the minimizer of gx{xx, fix, 0i, 02, 03) 
over (01,02,03)- Since the domain Ft is large enough that (0i,02,03) is an interior point in Ft, the 
optimality condition of gx shows that we have 9^ = xx and 03 = fix- 

Since hx is j strongly convex, we have ||0( — 03II2 + ||02 — 02II2 + ||03 “ 03II2 — (Fact 37). 
Since 9^ = xx and 0^ = fix, we have 

||0“'2 - FaII" + 1101 - yA||2 < 2Ae. (11.11) 


l"A0i + y,y 


9x{x, fi, 01 , 02 , 93 ) = l^ + A 01 + ^,x 
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Therefore, we have ||xa — yA||2 ^ ||^2 “ ~ 2V2Ae, 

3II2 — \/2Ae. Using these, ||sa ||2 ^ ^ 


2 > ||*^y2 “ and 


2 < M, we have 


/aAsI) = 5(?«l) + i(f.«l)-5ll«l.-el|lU5^ll«l 


2112 “ 


> 


^ (c, xa) + ^ (c, yx) - M\/2\e 




+ 2V2Ae 


1 


1 

2 + a/2^) -- 
1_ A 


- {c,xx) + - {c,yx) - - XA - yAllo - TTvIia^Allo - ;tv 


1 

2 ' 2 2 
-MV^e - 2A\/t^||xa - 


2 

2 1 

2 2A 
-dA^e 


+ V2Xe 


2 1 

2 2A 


1 


-||xa|| 2\/^ - e - ^IlyAllsA/^ - e- 


Using ||xa - yA ||2 < (Lemma 46), ||xa ||2 < M and ||yA ||2 < M, we have 


/a( 4 > 4 ) ^ f\{x\,yx) 

- MV ^ - 2A\/^||xa - 


-4A^e 


-j\\xx\\^V^e - e - j\\yx\\^V^e - ^ 
> f\{x\,yx) 

-My/2\e - 2Xx/WeM - 4A2e 


-2MW2- - 2e. 


Since A > 2, we have 


/a( 02,4) > /A(xA,yA)-20MAVe-10A2e. 


Let z = . Lemma 46 shows that 


^ ^ m2 

max (c, x) < max fx{x,y )-\—— 

x&Kir\K2 x£Ki,y£K2 X 

_ ^ /\X2 

< /A(02>^3) + ^ + 2OMA\/i + lOA2e 

A 

^ 20M2 o 

< (c, ^ H---h 20A e 

A 

because 2QMXy/e < 10=^ + lOA^e. Furthermore, we have 

11^-^a|| 2 + 11^-yA||2 < IK 2 - ^a|| 2 + 11^3 - yA||2 + IK 2 - ^3||2 

, ^ /6M2 

< 4V^ + 


2 > 


□ 


63 







We now apply our cutting plane method to solve the optimization problem (11.1). First we 
show how to transform the optimization oracles for Ki and K 2 to get a separation oracle for h\, 
with the appropriate parameters. 

Lemma 48. Suppose we have a e-optimization oracle for Ki and K 2 for some 0 < e < 1. Then on 
the set {II 0 II 2 < D}, we have a {0{Ve\D), 0{Ve\D))-separation oracle for hx with time complexity 
00 ,{Ki) + 00 ,{K 2 ). 

Proof. Recall that the function hx is defined by 
hx{ 0 i, 62 , 03 ) 

” ((1 + + X’*) + (1 “ + X’*) + 

= S (5 + + J '^)+S {l - + X’») + 

Lemma 40 shows how to compute the subgradient of functions of the form /(c) = max^g^ (c, x) 
using the optimization oracle for K. The rest of the term are differentiable so its subgradient is just 
the gradient. Hence, by addition rule for subgradients (Fact 37), we have a 0(eA)-subgradient oracle 
for fx using a 0(e)-optimization oracle for Ki and K 2 . The result then follows from Lemma 38. □ 

Theorem 49. Assume ( 11 - 2 ) and (11.3). Suppose that we have e-optimization oracle for every 
e > 0. For 0 < 6 < 1, we can find F G IR"' such that 


max (c, x) < 5 + (c, z) 
xeKinK 2 


and \\Z — FII 2 + 11 -?— y \\2 ^ ^ some x ^ Ki and y G K 2 in time 

o(^n(00,(i7i) + 00,(i72))log Tn^log^W ("t)) 

where rj = n 

Proof. Setting A = ^^0- and e = in Lemma 47 we see that so long as we obtain any 

approximate solution i 6 [, 62 ,d'fi) such that 

hx{9'i,92,9'^) < _ min /ia( 0 i, < 92 , @ 3 ) + e, 

(01,02,(?3)SO 


then we obtain the point we want. To apply Theorem 42, we use 


^ 91 , 62 , 63 ) 


h\{9i, 62 , 03 ) ii { 91 , 62 , 03 ) £ Q 

+00 else 


Lemma 48 shows that for any 7 > 0 we can obtain a ( 7 , 7 )-separation oracle of hx{9) by using 
sufficiently accurate optimization oracles. Since Q is just a product of balls, we can produce a 
separating hyperplane easily when ( 01 , 02 ,^ 3 ) ^ Al. Hence, we can obtain a ( 7 , 7 )-separation oracle 
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of h{6). For simplicity, we use 6 to represent ( 01 ,^ 2 ; ^s)- Note that Boo{2M) D O and therefore we 
can apply Theorem 42 with R = 2M to compute 6 ' such 

h{0') — min h{9) < 7 + a ( max h{0) — min h{6) 

een V 0eo oen 

in time O log (^) + log*^^^^ (^)) where 7 = (aMinWidth(n)/n‘^^^^) = fl 

= MinWidth(n) = ^(1)- Using A > 1 and M > 1, we have 

maxh(0) — minh(0) < O (AM^) < O 
een een 

Setting a = 0 with some small enough constant, we have that we can find 9' such that 

h\{9') < min/iA(0) + 7 + aO 
e&p 

= min/iA(6*) + e 
e&p 

in time O ^nSO^^^log (^) +n^log^^^^ (^)) where 7 = £7 Lemma 48 shows that 

the cost of ( 7 , 7 )-separation oracle is just 0 {00r^{Ki) + 00 ^(iL 2 )) where 7 = fl D 

Remark 50. Note that the algorithm does not promise that we obtain a point close to Ki n K 2 . 
It only promises to give a point that is close to both some point in Ki and some point in 7 ^ 2 - It 
appears to the authors that a further assumption is needed to get a point close to iLi n 1 ^ 2 . For 
example, if Ki and K 2 are two almost parallel lines, it would be difficult to get an algorithm that 
does not depend on the angle. However, as far as we know, most algorithms tackling this problem 
are pseudo-polynomial and have polynomial dependence on the angle. Our algorithm depends on 
the logarithmic of the angle which is useful for combinatorial problems. 

This reduction is very useful for problems in many areas including linear programming, semi- 
definite programming and algorithmic game theory. In the remainder of this section we demonstrate 
its power by applying it to classical combinatorial problems. 

There is however one issue with applying our cutting plane algorithm to these problems. As 
with other convex optimization methods, only an approximately optimal solution is found. On the 
other hand, typically an exact solution is insisted in combinatorial optimization. To overcome this 
gap, we introduce the following lemma which ( 1 ) transforms the objective function so that there 
is only one optimal solution and ( 2 ) shows that an approximate solution is close to the optimal 
solution whenever it is unique. As we shall see in the next two subsections, this allows us to round 
an approximate solution to an optimal one. 

Lemma 51. Given a linear program min^^^gc^x where x, c G b G Z™ and A G 

Suppose {Ax > b} is an integral polytope (i.e. all extreme points are integral) contained in the 
set III^IIqq < M}- Then we can find a random cost vector T G Z” with ||T||^ < 0(n^M^||cj|^) 
such that with constant probability, min^^^^A^x has an unique minimizer x* and this minimizer 
is one of the minimizer(s) 0 /min^^^gtr^x. Furthermore, if there is an interior point y such that 
z^y < min^^^^.^x -|- 5, then ||y — < 2nMS. 
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Proof. The first part of the lemma follows by randomly perturbing the cost vector c. We con¬ 
sider a new cost vector z = lOOn^M^c-|- r where each coordinate of r is sampled randomly from 
{0,1, • • • , lOnM}. [67, Lem 4] shows that the linear program min^-^j^i^T has a unique minimizer 
with constant probability. Furthermore, it is clear that the minimizer of min^-^gz^x is a minimizer 
of min^^^gu^x (as fi <C lOOn^M^jcjl). 

Now we show the second part of the lemma. Given an interior point y of the polytope {Ax > 6}, 
we can write y as a convex combination of the vertices of {Ax > b}, i.e. y = YltiVi. Note that 
^y = 'Yf ti^Vi- If all Vi are not the minimizer, then ^Vi > OPT -|- 1 and hence ^y > OPT -|- 1 
which is impossible. Hence, we can assume that vi is the minimizer. Hence, ^Vi = OPT if i = 1 
and ^Vi > OPT -|-1 otherwise. We then have ^y > OPT -|- (1 — ti) which gives 1 — <5. Finally, 

the claim follows from \\y — ~ '^i||oo — 

11.2 Matroid Intersection 

Let Ml = {E,Ii) and M 2 = {E,l2) be two matroids sharing the same ground set. In this section 
we consider the weighted matroid intersection problem 

min w(S). 
seXinX2 


where w £ and w{S) = Ye&s'^e- 

For any matroid M = (E,I), it is well known that the polytope of all independent sets has the 
following description [28]: 

conv(Xi) = {x G IR® s.t. 0 < x{S) < r{S) for all S C E} (11.12) 

where r is the rank function for M, i.e. r(S) is the size of the largest independent set that is 
a subset of S. Furthermore, the polytope of the matroid intersection satisfies conv(Xi 11 X 2 ) = 
conv(Xi) n conv(X 2 ). 

It is well known that the optimization problem 

min w(S) and min w(S) 

5eii Sei2 

can be solved efficiently by the greedy method. Given a matroid (polytope), the greedy method 
finds a maximum weight independent subset by maintaining a candidate independent subset S and 
iteratively attempts to add new element to S in descending weight. A element i is added to S 
if S' U {i} is still independent. A proof of this algorithm is well-known and can be found in any 
standard textbook on combinatorial optimization. 

Glearly, the greedy method can be implemented by 0{n) calls to the independence oracle (also 
called membership oracle). For rank oracle, it requires 0(r logn) calls by finding the next element 
to add via binary search. Therefore, we can apply Theorem 49 to get the following result (note 
that this algorithm is the fastest if r is close to n for the independence oracle). 

Theorem 52. Suppose that the weights w are integer with ||rc||^ < M. Then, we can find 

S G argminr(;(S) 

5eiini2 

in time O ^nGO log (nM) + log^^^^ (nM)^ where GO is the cost of greedy method for Ii and X 2 . 
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Proof. Applying Lemma 51, we can find a new cost zsuch that 

-fT^ 

mm z X 

X G conv (Xi) flconv (X2) 

has an unique solution. Note that for any x G conv(Xi), we have ||^||g^ < 1- Hence, applying 
theorem 49, we can find q such that (f'z < OPT + e and H? “ ^||2 + ||9 “ y ||2 — ^ some 
X G conv(Xi) and y G conv(X 2 ). Using (11.12), we have the coordinate wise minimum of x, y, i.e. 
min{x, y},is in conv(Xi) 0 conv(X 2 ). Since ||g — min{x, y }||2 < ||g — x ||2 + ||g — y ||2 < £> we have 

(min{x, y})^ z < OPT + nMe. 

Hence, we have a feasible point min{x, y} which has value close to optimal and Lemma 51 shows that 
II min(x, < 2n^M^e where s is the optimal solution. Hence, we have || 9 ~s||^ < 2n^M^e+e. 

Picking e = ; we have ||g — ^|^ < | and hence, we can get the optimal solution by rounding 

to the nearest integer. 

Since optimization over Xi and X 2 involves applying greedy method on certain vectors, it takes 
only 0(G0) time. Theorem 49 shows it only takes O ^nGO log (nM) + log^^^^ (nM)^ in finding 

such q. □ 

This gives the following corollary. 

Corollary 53. ITe have 0{'nfTind^og{nM)+n^ log*^^^^ nM) and 0{nrTrank^ognlog{nM)+n^ log*^^^^ nM) 
time algorithms for weighted matroid intersection. Here Tind is the time needed to eheck if a subset 
is independent, and Trank is the time needed to compute the rank of a given subset. 

Proof. By Theorem 52, it suffices to show that the optimization oracle for the matroid polytope can 
be implemented in 0{nTnd) and 0(r7)^ank logn) time. This is simply attained by the well-known 
greedy algorithm which iterates through all the positively-weighted elements in decreasing order, 
and adds an element to our candidate independent set whenever possible. 

For the independence oracle, this involves one oracle call for each element. On the other 
hand, for the rank oracle, we can find the next element to add by binary search which takes time 
OiTv ^nV logn). Since there are at most r elements to add, we have the desired running time. □ 

11.3 Submodular Flow 

Let G = {V,E) be a directed graphwith |F| = m, let / be a submodular function on with 
\V\ = n, /(0) = 0 and f{V) = 0, and let A be the incidence matrix of G. In this section we 
consider the submodular flow problem 

Minimize {c,<p) (11.13) 

subject to l{e) < ^p{e) < u{e) Ve G F 
x(v) = (A(p)(v) Vv G V 
J2^(v)<f(S) VSCV 

vGS 

where c G Z®, I G Z®, u € where G = ||cj|g^ and U = max (||w||^, IMIoo’l/(5')|)- Here 
c is the cost on edges, ip is the flow on edges, I and u are lower and upper bounds on the amount 
of flow on the edges, and x{v) is the net flow out of vertex v. The submodular function / upper 
bounds the total net flow out of any subset S of vertices by f{S). 
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Theorem 54. Suppose that the cost veetor c is integer weight with ||cj|^ < C and the eapacity 
vector and the submodular function satisfy U = max (||it||g^, |/(5)|). Then, we can 

solve the submodular flow problem (11.13) in time O ^n^EO log(mC'?7) + log‘^^^\mCU)^ where 

EO is the cost of function evaluation. 

Proof. First, we can assume l{e) < u{e) for every edge e, otherwise, the problem is infeasible. Now, 
we apply a similar transformation in [49] to modify the graph. We create a new vertex vq. For every 
vertex v in V, we create a edge from vq to v with capacity lower bounded by 0, upper bounded 
by 4nC/, and with cost 2mCU. Edmonds and Giles showed that the submodular flow polytope 
is integral [29]. Hence, there is an integral optimal flow on this new graph. If the optimal flow 
passes through the newly created edge, then it has cost at least 2mCU — mCU because the cost 
of all other edges in total has at least —mCU. That means the optimal flow has the cost larger 
than mCU which is impossible. So the optimal flow does not use the newly created edges and 
vertex and hence the optimal flow in the new problem gives the optimal solution of the original 
problem. Next, we note that for any (p on the original graph such that 1(e) < p{e) < u(e), we can 
send suitable amount of flow from vq to v to make p feasible. Hence, this modification makes the 
feasibility problem trivial. 

Lemma 51 shows that we can assume the new problem has an unique solution and it only blows 
up C by a (rnU)^^^^ factors. 

Note that the optimal value is an integer and its absolute value at most mCU. By binary 
search, we can assume we know the optimal value OPT. Now, we reduce the problem to finding 
a feasible p with {{d,p) < OPT + e} with e determined later. Let be the set of such p. Note 
that Pg = Ki^f: n iL 2 ,e where 




K2,e 


X G IR^ such that 


1(e) < p(e) < u(e) Ve G P 'j 

x(v) = (Ap)(v) Vu G H for some p > , 
(d, p) < OPT + e J 


?/ G 


such that E y(v) < f(S) 

veS 


VP C 


v,Y,y{v) = fiv) 

vev 


Note that the extra condition = f(V) is valid because '^yy(v) = = 0 

f(V) = 0, and has radius bounded by 0((mCU)^‘^^'>) and 1^2,e has radius bounded by 0(nU). 
Furthermore, for any vector c G R'^, we note that 


max (c, x) = max (c, x) 

= max (c, Aif) 

/<(,£7<u,{c/,£/?}<OPT+e 

= max (A^c,p). 

l<(p<u,{d,(fl))<OPT-\-e ^ 


To solve this problem, again we can do a binary search on (d, p) and reduce the problem to 

max (A^c,p) 

l<ip<u,{d.,ip)=K 

for some value of K. Since A?"c is fixed, this is a linear program with only the box constraints 
and an extra equality constraint. Hence, it can be solved in nearly linear time [76, Thm 17, ArXiv 
vl]. As the optimization oracle for involves only computing A^c and solving this simple linear 
program, it takes only 0(n^ log‘^^^^(mC'P/e)) time. On the other hand, since P 2 ,e is just a base 
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polyhedron, the optimization oracle for K 2 ^e can be done by greedy method and only takes O(nEO) 
time. 

Applying Theorem 49, we can find q such that ||g — + ||(7 — < 5 for some x G 

y G K 2 ,e and S to be chosen later. According to the definition of Ki^^, there is ip such that 
^(e) < ip{e) < u{e) and x{v) = {Aip){v) for all v and {d^ip) < OPT + e. Since ||7/ — < 25, that 

means \y{v) — (A(/?)(u)| < 25 for all v. 

• Case 1) If y{v) > {Aip){v), then we can replace y{v) by {Aip){v), note that y is still in A' 2 ,e 
because of the submodular constraints. 

• Case 2) If y{v) < {Aip){v), then we can send a suitable amount of flow from vq to v to make 
ip feasible y{v) < {Aip){v). 

Note that under this modification, we increased the objective value by {5n){2mCU) because the 
new edge cost 2mCU per unit of flow. Hence, we find a flow ip which is feasible in new graph 

with objective value e + {Sn){2mCU) far from optimum value. By picking 5 = 2 mnCU ’ have the 

value 2e far from OPT. Now, we use Lemma 51 to shows that when e is small enough, i.e, 

for some constant c, then we can guarantee that H?/ ~ ^ \ where x* is the optimal demand. 

Now, we note that ||g — y ||2 ^ d and we note that we only modify y by a small amount, we in fact 
have \\q — Hence, we can read off the solution x* by rounding q to the nearest integer. 

Note that we only need to solve the problem Ki^^nK 2 ^e to accuracy and the optimization 

oracle for and iL 2 ,e takes time 0{n'^ log^^^\mCU)) and O(nEO) respectively. Hence, Theorem 
49 shows that it takes O (n‘^F,Olog{mCU) + log‘^*'^^(mC'f7)^ time to find x* exactly. 

After getting x* , one can find ip* by solving a min cost flow problem using interior point method 
[74], which takes 0{m^/nlog^^^\mCU)) time. □ 

11.4 AfRne Subspace of Convex Set 

In this section, we give another example about using optimization oracle directly via regularization. 
We consider the following optimization problem 

max _(c, (11-14) 

x^K and Ax=b 


where x, c G IR”, AT is a convex subset of R"', A G R'’^"- and b G R”^. We suppose that r n 
and thus, the goal of this subsection is to show how to obtain an algorithm takes only 0{r) many 
iterations. To do this, we assume a slightly stronger optimization oracle for K: 

Definition 55. Given a convex set K and 5 > 0. A 5-2nd-order-optimization oracle for K is a 
function on R*^ such that for any input c G R"' and A > 0, it outputs y such that 

max ((c, x) — A||x||^ ) < 5 + (c, ^ — AllylP. 

xeK V n n / n n 


We denote by OO^j[(A') the time complexity of this oracle. 

The strategy for solving this problem is very similar to the intersection problem and hence some 
details are omitted. 
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Theorem 56. Assume that ||x ||2 < M, < M, < M, HAjl^ < M and Amin(A) > 

1/M. Assume that An {Ax = 5} / 0 and we have e-2nd-order-optimization oracle for every e > 0. 
For 0 < 6 < 1, we can find z £ K such that 

max ^ (c, x) < (5 + (c, A) 
x£K and Ax=b 


and ||Az — < 6. 


This algorithm takes time 


O ^rOOj^^}(A)log (^)) 

where r is the number of rows in A, rj = and X = 

Proof. The proof is based on the minimax problem 

OpT^ min max (c, x) +/?/, Ax — A — — ||x||; 

\ /A" 

where A = for some large constant c. We note that 

OPT;^ = max min (c, x) +/r/, Ax — A — —llxl 

^ / A" ' 

= max (c, x) — All Ax — 6|L — — ||x||^. 
x(^K " A" 


Since Amin(A) > 1/M and the set K is bounded by M, one can show that the saddle point (x*, ff*) 
of the minimax problem gives a good enough solution x for the original problem for large enough 
constant c. 

For any r/, we define 


Xf! = arg max (c, x) + (if, Ax — b) — -r Iloilo- 
xgk \ /A" 

Since the problem is strongly concave in x, one can prove that 


—* —*5): 

\Xfj- X II2 < 


h / 


Ih- 


Hence, we can first find an approximate minimizer of the function f{ff) = (c, x)+^r/, Ax — b^ 

^ 11 X 112 and use the oracle to find x^. 

To find an approximate minimizer of /, we note that the subgradient of / can be found using 
the optimization oracle similar to Theorem 49. Hence, the result follows from our cutting plane 
method and the fact that if £W. □ 


Remark 57. In [74], they considered the special case A = {x : 0 < Xj < 1} and showed that it can 
be solved in 0{y/r) iterations using interior point methods. This gives the current fastest algorithm 
for the maximum flow problem on directed weighted graphs. Our result generalizes their result to 
any convex set A but with 0{r) iterations. This suggests the following open problem: under what 
condition on A can one optimize linear functions over affine subspaces of A with r constraints in 
0{y/r) iterations? 
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Part III 

Submodular Function Minimization 


12 Introduction 

Submodular functions and submodular function minimization (SFM) are fundamental to the field 
of combinatorial optimization. Examples of submodular functions include graph cut functions, set 
coverage function, and utility functions from economics. Since the seminal work by Edmonds in 
1970 [27], submodular functions and the problem of minimizing such functions (i.e. submodular 
function minimization) have served as a popular modeling and optimization tool in various fields 
such as theoretical computer science, operations research, game theory, and most recently, machine 
learning. Given its prevalence, fast algorithms for SFM are of immense interest both in theory and 
in practice. 

Throughout Part III, we consider the standard formulation of SFM: we are given a submodular 
function / defined over the subsets of a n-element ground set. The values of / are integers, have 
absolute value at most M, and are evaluated by querying an oracle that takes time EO. Our goal is 
to produce an algorithm that solves this SFM problem, i.e. hnds a minimizer of /, while minimizing 
both the number of oracle calls made and the total running time. 

We provide new 0(n^ log nM • EO + log*^^^^ nM) and 0(n^ log^ n ■ EO + log^^^^ n) time 

algorithms for SEM. These algorithms improve upon the previous fastest weakly and strongly 
polynomial time algorithms for SEM which had a a running time of 0((n^ • EO + n^) logM) [54] 
and 0{n^ -EO + n®) [90] respectively. Consequently, we improve the running times in both regimes 
by roughly a factor of O(n^). 

Both of our algorithms bear resemblance to the classic approach of Grotschel, Lovasz and 
Schrijver [49, 50] using the Lovasz extension. In fact our weakly polynomial time algorithm directly 
uses the Lovasz extension as well as the results of Part II to achieve these results. Our strongly 
polynomial time algorithm also uses the Lovasz extension, along with more modern tools from the 
past 15 years. 

At a high level, our strongly polynomial algorithms apply our cutting plane method in con¬ 
junction with techniques originally developed by Iwata, Fleischer, and Fujishige (IFF) [56]. Our 
cutting plane method is performed for enough iterations to sandwich the feasible region in a narrow 
strip from which useful structural information about the minimizers can be deduced. Our ability 
to derive the new information hinges on a significant extension of IFF techniques. 

Over the past few decades, SFM has drawn considerable attention from various research com¬ 
munities, most recently in machine learning [11, 68]. Given this abundant interest in SFM, we hope 
that our ideas will be of value in various practical applications. Indeed, one of the critiques against 
existing theoretical algorithms is that their running time is too slow to be practical. Our contribu¬ 
tion, on the contrary, shows that this school of algorithms can actually be made fast theoretically 
and we hope it may potentially be competitive against heuristics which are more commonly used. 
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12.1 Previous Work 


Here we provide a brief survey of the history of algorithms for SFM. For a more comprehensive 
account of the rich history of SFM, we refer the readers to recent surveys [81, 55]. 

The first weakly and strongly polynomial time algorithms for SFM were based on the ellipsoid 
method [65] and were established in the foundational work of Grotschel, Lovasz and Schrijver in 
1980’s [49, 50]. Their work was complemented by a landmark paper by Cunningham in 1985 which 
provided a pseudopolynomial algorithm that followed a flow-style algorithmic framework [20]. His 
tools foreshadowed much of the development in SFM that would take place 15 years later. Indeed, 
modern algorithms synthesize his framework with inspirations from various max flow algorithms. 

The first such “flow style” strongly polynomial algorithms for SFM were discovered indepen¬ 
dently in the breakthrough papers by Schrijver [93] and Iwata, Fleischer, and Fujishige (IFF) [56]. 
Schrijver’s algorithm has a running of 0(n® • EO -|- n®) and borrows ideas from the push-relabel 
algorithms [46, 25] for the maximum flow problem. On the other hand, IFF’s algorithm runs in 
time 0(n^ logn-EO) and 0(n® - EO logM), and applies a flow-scaling scheme with the aid of certain 
proximity-type lemmas as in the work of Tardos [100] . Their method has roots in flow algorithms 
such as [52, 47]. 

Subsequent work on SEM provided algorithms with considerably faster running time by extend¬ 
ing the ideas in these two “genesis” papers [93, 56] in various novel directions [107, 31, 54, 90, 60]. 
Currently, the fastest weakly and strongly polynomial time algorithms for SFM have a running 
time of 0((n^ • EO -|- n^) logM) [54] and 0(n^ • EO -|- n®) [90] respectively. Despite this impressive 
track record, the running time has not been improved in the last eight years. 

We remark that all of the previous algorithms for SFM proceed by maintaining a convex com¬ 
bination of 0(n) BFS’s of the base polyhedron, and incrementally improving it in a relatively local 
manner. As we shall discuss in Section 12.2, our algorithms do not explicitly maintain a convex 
combination. This may be one of the fundamental reasons why our algorithms achieve a faster 
running time. 

Finally, beyond the distinction between weakly and strongly polynomial time algorithms for 
SFM, there has been interest in another type of SFM algorithm, known as fully combinatorial 
algorithms in which only additions and subtractions are permitted. Previous such algorithms 
include [60, 54, 53]. We do not consider such algorithms in the remainder of the paper and leave it 
as an open question if it is possible to turn our algorithms into fully combinatorial ones. 

12.2 Our Results and Techniques 

In Part HI we show how to improve upon the previous best known running times for SFM by 
a factor of O(n^) in both the strongly and weakly polynomial regimes. In Table 11 summarizes 
the running time of the previous algorithms as well as the running times of the fastest algorithms 
presented in this paper. 

Both our weakly and strongly polynomial algorithms for SFM utilize a convex relaxation of the 
submodular function, called the Lovasz extension. Our algorithms apply our cutting plane method 
from Part I using a separation oracle given by the subgradient of the Lovasz extension. To the best 
of the author’s knowledge, Crotschel, Lovasz and Schrijver were the first to formulate this convex 
optimization framework for SFM [49, 50]. 

For weakly polynomial algorithms, our contribution is two-fold. First, we show that cutting 
plane methods such as Vaidya’s [105] can be applied to SFM to yield faster algorithms. Second, 
as our cutting plane method. Theorem 42, improves upon previous cutting plane algorithms and 
consequently the running time for SFM as well. This gives a running time of 0(re^ log nM • EO -|- 
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Authors 

Years 

Running times 

Remarks 

Grotschel, Lovasz, 
Schrijver [49, 50] 

1981,1988 

0(n5-E0 + n^)[81] 

first weakly 
and strongly 

Cunningham [20] 

1985 

0{Mn^ lognM-EO) 

first pseudopoly 

Schrijver [93] 

2000 

0(n^ • EO -|- nP) 

first combin. strongly 

Iwata, Fleischer, 
Fujishige[56] 

2000 

0(n^ • EO logM) 

0{n'^ logn • EO) 

first combin. strongly 

Iwata, Fleischer [31] 

2000 

0(V • EO -h n«) 


Iwata [54] 

2003 

0((n4-EO + n^) log M) 

0{{n^ ■ EO -|- nJ) logn) 

current best weakly 

Vygen [107] 

2003 

0{n‘ • EO -|- rP) 


Orlin [90] 

2007 

0{n^ • EO -h n^i 

current best strongly 

Iwata, Orlin [60] 

2009 

0((n^ • EO -7 n^) log nM) 

0{{n^ ■ EO -|- n®) logn) 


Our algorithms 

2015 

0(n^ log nM ■ EO -|- n^ log^^^^ nM) 
0{n^ log^ n • EO -|- n^ log^*-^^ n) 



Table 11: Algorithms for submodular function minimization. Note that some of these algorithms 
were published in both conferences and journals, in which case the year we provided is the earlier 
one. 


n? log^^l^l nM), an improvement over the previous best algorithm by Iwata [54] by a factor of almost 
0 {v?). 


Our strongly polynomial algorithms, on the other hand, require substantially more innovation. 
We first begin with a very simple geometric argument that SFM can be solved in 0{n^\ogn ■ EO) 
oracle calls (but in exponential time). This proof only uses Grunbaum’s Theorem from convex 
geometry and is completely independent from the rest of the paper. It was the starting point of 
our method and suggests that a running time of 0{n^ ■ EO + for submodular minimization 

is in principle achievable. 

To make this existence result algorithmic, we first run cutting plane, Theorem 31, for enough 
iterations such that we compute either a minimizer or a set P containing the minimizers that 
fits within in a narrow strip. This narrow strip consists of the intersection of two approximately 
parallel hyperplanes. If our narrow strip separates P from one of the faces Xi = 0, Xi = 1, we can 
effectively eliminate the element i from our consideration and reduce the dimension of our problem 
by 1. Otherwise a pair of elements p,q can be identified for which q is guaranteed to be in any 
minimizer containing p (but p may not be contained in a minimizer). Our first algorithm deduces 
only one such pair at a time. This technique immediately suffices to achieve a 0(n^ • EO + n®) time 
algorithm for SEM (See Section 15.3). We then improve the running time to 0{n^ ■ EO + n^) by 
showing how to deduce many such pairs simultaneously. Similar to past algorithms, this structural 
information is deduced from a point in the so-called base polyhedron (See Section 13). 

Readers well-versed in SEM literature may recognize that our strongly polynomial algorithms 
are reminiscent of the scaling-based approach first used by IFE [56] and later in [54, 60]. While 
both approaches share the same skeleton, there are differences as to how structural information 
about minimizers is deduced. A comparison of our algorithms and previous ones are presented in 
Section 16. 

Finally, there is one more crucial difference between these algorithms which we believe is re¬ 
sponsible for much of our speedup. One common feature shared by all the previous algorithms is 
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that they maintain a convex combination of 0(n) BFS’s of the base polyhedron, and incrementally 
improve on it by introducing new BFS’s by making local changes to existing ones. Our algorithms, 
on the other hand, choose new BFS’s by the cutting plane method. Because of this, our algorithm 
considers the geometry of the existing BFS’s where each of them has influences over the choice of 
the next BFS. In some sense, our next BFS is chosen in a more “global” manner. 

12.3 Organization 

The rest of Part III is organized as follows. We first begin with a gentle introduction to submodular 
functions in Section 13. In Section 14, we apply our cutting plane method to SFM to obtain a 
faster weakly polynomial algorithms. In Section 15 we then present our results for achieving better 
strongly polynomial algorithms, where a warm-up 0(n^ • EO -|- n®) algorithm is given before the 
full-fledged 0(n^ • EO -|- n^) algorithm. Einally, we end the part with a discussion and comparison 
between our algorithms and previous ones in Section 16. 

We note that there are a few results in Part III that can be read fairly independently of the 
rest of the paper. In Theorem 61 we show how Vaidya’s algorithm can be applied to SFM to 
obtain a faster weakly polynomial running time. Also in Theorem 71 we present a simple geometric 
argument that SFM can be solved with 0{rfi\ogn ■ EO) oracle calls but with exponential time. 
These results can be read with only a working knowledge of the Lovasz extension of submodular 
functions. 

13 Preliminaries 

Here we introduce background on submodular function minimization (SFM) and notation that we 
use throughout Part III. Our exposition is kept to a minimal amount sufficient for our purposes. 
We refer interested readers to the extensive survey by McCormick [81] for further intuition. 

13.1 Submodular Function Minimization 

Throughout the rest of the paper, let P = {1,..., n} = [n] denote a ground set and let / : 2^ —)• Z 
denote a submodular function dehned on subsets of this ground set. We use V and [n] interchange¬ 
ably and let [0] '= 0. We abuse notation by letting S + i = S' U {i} and S — i = S'\{i} for an 
element i € V and a set S' C 2^^. Formally, we call a function submodular if it obeys the following 
property of diminishing marginal differences: 

Definition 58 (Submodularity). A function / : 2^ — Z is submodular if /(T -I- i) — f{T) < 
f{S + i) — f{S) for any S' C T and i G V\T. 

For convenience we assume without loss of generality that /(0) = 0 by replacing /(S') by 
f{S) — /(0) for all S. We also let M max 5 g 2 ^ l/(‘S')|- 

The central goal of Part III is to design algorithms for SFM, i.e. computing the minimizer of 
/. We call such an algorithm strongly polynomial if its running time depends only polynomially 
on n and EO, the time needed to compute /(S') for a set 5, and we call such an algorithm weakly 
polynomial if it also depends polylogarithmically on M. 

13.2 Lovasz Extension 

Our new algorithms for SFM all consider a convex relaxation of a submodular function, known as 
the Lovasz extension, and then carefully apply our cutting plane methods to it. Here we formally 
introduce the Lovasz extension and present basic facts that we use throughout Part HI. 
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The Lovasz extension of / : [0,1]"" 


IR of our submodular function / is defined for all x by 


/(f) = Ef^[o,i][/({i : Xi > t})], 


where t ~ [0,1] is drawn uniformly at random from [0,1]. The Lovasz extension allows us to reduce 
SFM to minimizing a convex function defined over the interior of the hypercube. Below we state 
that the Lovasz extension is a convex relaxation of / and that it can be evaluated efficiently. 


Theorem 59. The Lovasz extension f satisfies the following properties: 

1. f is convex and min^g[o,i]n /(f) = min^cf^] f{S); 

2. f{S) = f{Is), where Is is the characteristic vector for S, i.e. Is{i) 

3. If S is a minimizer of f, then Is is a minimizer of f; 

4- Suppose xi > ■ ■ ■ > Xn > Xn+i '= 0, then 

n n 



i=l i=l 


ifi^S _ 
if i^ s’ 


Proof. See [50] or any standard textbook on combinatorial optimization, e.g. [94]. □ 

Next we show that we can efficiently compute a subgradient of the Lovasz or alternatively, a 
separating hyperplane for the set of minimizers of our submdoular function /. First we remind the 
reader of the definition of a separation oracle, and then we prove the necessary properties of the 
hyper plane, Theorem 61. 

Definition 60 (separation oracle, Defintion 1 restated for Lovasz extension). Given a point x and 
a convex function / over a convex set P, a^x < 6 is a separating hyperplane if a^x > b and any 
minimizer x* of / over P satisfies if"x* < b. 

Theorem 61. Given a point x £ [0,1]*^ assume without loss of generality (by re-indexing the 
coordinates) that fi > • • • > Xn- Then the following inequality is a valid separating hyperplane for 
X and f: 

n 

i=l 

i.e., it satisfies the following: 

1. (separating) x lies on I][Li(/([*]) - /([* - < f{x). 

2. (valid) For any x, we have Yfi=iifi[i]) ~ fi[i ~ ^ /(^)- particular, i[i]) ~ 

f{[i — l]))x* < /(f) for any minimizer x*, i.e. the separating hyperplane does not cut out 
any minimizer. 

Moreover, such a hyperplane can be computed with n oracle calls to f and in time 0{n ■ EO + n^). 
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Proof. Note that by Theorem 59 we have that X]ie[n](/([*]) “ /([^ “ — /(^) thus the 

hyperplane satisfies the separating condition. Moreover, clearly computing it only takes time 0{n ■ 
EO + V?) as we simply need to sort the coordinates and evaluate f at n points, i.e. each of the [f]. 
All that remains is to show that the hyperplane satisfies the valid condition. 

Let {i ■ Xi > t}. Recall that f{x) = Thus /(x) can be written as a 

convex combination f{x) = where at > 0 and ~ However, by diminishing 

marginal differences we see that for all t 


and therefore since Xlt = x we have 

n 

E </['! - >"(1' - = E Ewd'i) -/([=- ID) <'i<->). £ E = /(*)■ 

ie[n] t i=l t 

□ 

13.3 Polyhedral Aspects of SFM 

Here we provide a natural primal dual view of SFM that we use throughout the analysis. We 
provide a dual convex optimization program to minimizing the Lovasz extension and provide several 
properties of these programs. We believe the material in this section helps crystallize some of the 
intuition behind our algorithm and we make heavy use of the notation presented in this section. 
However, we will not need to appeal to the strong duality of these programs in our proofs. 

Consider the following primal and dual programs, where we use the shorthands y{S) = Ylies 
and y~ = min{0, yt}. Here the primal constraints are often called the base polyhedron B{f) '= {y G 
IR” : y{S) < f{S)'dS % V,y{y) = f{V)} and the dual program directly corresponds to minimizing 
the Lovasz extension and thus f. 


Primal 

Dual 

maxy~(V) 

y(S) < f(S)VS 2 F 
y{v) = f{v) 

min/(x) 

0 < f < 1 


Theorem 62. h is a basic feasible solution (BPS) of the base polyhedron 13{f) if and only if 

hi = f{{vi,...,Vi}) - f{{vi,...,Vi-i}) 

for some permutation vi, ...,Vn of the ground set V. We call vi, ...,Vn the defining permutation of 
h. We call Vi precedes Vj for i < j. 

This theorem gives a nice characterization of the BFS’s of B{f). It also gives the key observation 
underpinning our approach: the coefficients of each separating hyperplane in Theorem 61 
precisely corresponds to a primal BFS (Theorem 62). Our analysis relies heavily on this 
connection. We re-state Theorem 61 in the language of BFS. 


E (/(DD-/(|i-il)) 

ieL(‘) 

E (/([ilnLW)-/([i-l]nL(‘))) 

iehO) 

/(lW)-/(0) = /(lW) 
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Lemma 63. We have Wx < f{x) for any x G [0,1]" and BFS h. 

Proof. Any BFS is given by some permutation. Thus this is just Theorem 61 in disguise. □ 

We also note that since the objective function of the primal program is non-linear, we cannot 
say that the optimal solution to the primal program is a BFS. Instead we only know that it is a 
convex combination of the BFS’s that satisfy the following property. A proof can be found in any 
standard textbook on combinatorial optimization. 

Theorem 64. The above primal and dual programs have no duality gap. Moreover, there always 
exists a primal optimal solution y = Ylk = 1 (a convex combination of BFS 

s.t. any i with y* < 0 precedes any j with yj > 0 in the defining permutation for each BFS 

Our algorithms will maintain collections of BFS and use properties of /i G B{f), i.e. convex 
combination of BFS. To simplify our analysis at several points we will want to assume that such 
a vector h G B{f) is non-degenerate, meaning it has both positive and negative entries. Below, we 
prove that such degenerate points in the base polytope immediately allow us to trivially solve the 
SFM problem. 

Lemma 65 (Degenerate Hyperplanes). ///i G B{f) is non-negative then 9 is a minimizer of f and 
if h is non-positive then V is a minimizer of f. 

Proof. While this follows immediately from Theorem 64, for completeness we prove this directly. 
Let S' G 2^^ be arbitrary. If G B(f ) is non-negative then by the we have 

fis)>h{s) = ^hi>o = fm. 

ieS 

On the other hand if h is non-positive then by definition we have 

f{S) > HS) = '£hi>Y,hi = h{v) = f{v). 

ies i&v 


□ 


14 Improved Weakly Polynomial Algorithms for SFM 

In this section we show how our cutting plane method can be used to obtain a 0(n^ log nM • EO -|- 
log^*-^^ nM) time algorithm for SFM. Our main result in this section is the following theorem, 
which shows how directly applying our results from earlier parts to minimize the Lovasz extension 
yields the desired running time. 

Theorem 66. We have an 0(n^ log nM • EO -\- n^ log*^*-^^ nM) time algorithm for submodular 
function minimization. 

Proof. We apply Theorem 42 to the Lovasz extension / : [0,1]” — >■ IR with the separation oracle 
given by Theorem 61. / fulfills the requirement on the domain as its domain D = [0,1]"' is symmetric 
about the point (I/2,...,I/2) and has exactly 2n constraints. 

In the language of Theorem 42, our separation oracle is a (0, 0)-separation oracle with rj = 0 
and 5 = 0. 
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We first show that (5 = 0. Firstly, our separating hyperplane can be written as 


{['>]) - f{[i - ^))xi < f{x) = ^(/([f]) - f{[i - l]))xi, 

i=l i=l 


where the equality follows from Theorem 59. Secondly, for any x with f{x) < f{x) we have by 
Theorem 61 that 

n 

^ h*) s /(*) 

i=l 

which implies that x is not cut away by the hyperplane. 

Next we show that rj = 0. Our separating hyperplane induces a valid halfspace whenever it is 
not nonzero, i.e. f{[i]) 7 ^ f{[i — 1]) for some i. In the case that it is zero f{[i]) = f{[i — l])Vi, by 
the same argument above, we have /(x) = ~ /([* “ 1 ]))®* = C) and 

n 

f{x) > ^(/([i]) - /([i - l]))xi = 0 = f{x). 
i=l 


In other words, x is an exact minimizer, i.e. rj = 0. 

Notethat f{x) = : x* > t})]| < M as M = max 5 |/(S)|. Now plugging in a 

in the guarantee of Theorem 31, we can find a point x* such that 


1 

4M 


f{x*) — min /(x) < 

xe[o,i]" 


< 

< 


1 

4M 

1 

4M 

1 


( max 
(2M) 


fix) 


min 

^G[0,1]^ 


fix) 


We claim that minjgjQ^^] /({i : xt > t}) is minimum. To see this, recall from 59 that / has an 
integer minimizer and hence min^g[o,i]" fix) = min 5 fiS). Moreover, fix*) is a convex combination 
of f {{i : x* > t}) which gives 

1 > fix*) — min f(x) = /(x*) — min/(S') > min f({i : x* > t}) — min/(5). 
ie[o,i]" s te[o,i] s 


Since / is integer-valued, we must then have minjgjg^i] f{{i : x* > t}) = min 5 /( 5 ) as desired. 
Since our separation oracle can be computed by n oracle calls and runs in time 0(n • EO -|- n^), by 
Theorem 42 the overall running time is then 0(n^ log nM • EO -|- nf log*^^^^ nM) as claimed. □ 

Needless to say the proof above completely depends on Theorem 42. We remark that one 
can use the Vaidya’s cutting plane instead of ours to get a time complexity 0(n^ log nM • EO -|- 
logO(i) . logM). There is actually an alternate argument that gives a time complexity of 
0{n? log M ■ EO -|- • logM). Thus it requires slightly fewer oracle calls at the expense of 

slower running time. A proof is offered in this section, which can be skipped without any risk of 
discontinuation. This proof relies the following cutting plane method. 

Theorem 67 ([13] ). Given any convex set K C [0,1]"' with a separation oracle of cost SO, in 
time 0{kS0 + one can find either find a point x € K or find a polytope P such that K C P 

and the volume of K is at most (|]) . 
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The Theorem allows us to decrease the volume of the feasible region by a factor of (|)^ after 
k iterations. Similar to above, we apply cutting plane to minimize / over the hypercube [0,1]” for 
0(n log M) iterations, and outputs any integral point in the remaining feasible region P. 

Lemma 68. Let x* achieve the minimum function value f{x*) among the points used to query the 
separation oracle. Then 

1. X* ^ the current feasible region. 

2. Any x with f{x) < f{x*) belongs to P^^\ 

3. suppose x*^ > ■ ■ ■ > x^^ and let Sj = {ii,... ,ij}. Then Si G argmin^^. f{Sj) also belongs to 

p{k)_ 

Proof. For any separating hyperplane IPx < f{x) given by x, we have by Lemma 63 that x* < 
f{x*). Since f{x*) is the minimum among all f{x), IPx* < f{x) and hence x* is not removed by 
any new separating hyperplane. In other words, x* G . The argument for (2) is analogous. 

For (3), recall that by the definition of Lovasz extension f{x*) is a convex combination of f{Sj) 
and thus the indicator variable I5, for Si satisfies f{Isi) < fi^*)- By Lemma 63 again, this implies 
m"lsi < f{Isi) < f{x*) < /(^) for any separating hyperplane IPx < f{x). □ 

Theorem 69. Suppose that we run Cutting Plane in Theorem 67 for 0{n\ogM) iterations. Then 
Si from the last lemma also minimizes f. 

Proof. We use the notations from the last lemma. After k = Knlog 2 /s M iterations, the volume of 
the feasible region is at most By the last lemma, Isi G P^^\ 

Suppose for the sake of contradiction that S minimizes / but f{S) < f{Si). Since / is integer¬ 
valued, /(S') + 1 < f{Si). Let r 1/6M. Consider the set P {x : 0 < x* < r Vi ^ S, 1 — r < 
Xj < 1 Vi G S}. We claim that for x G P, 

/(x) < /(S) + 1. 

To show this, note that /({i ; Xj > t}) = /(S) for r < t < 1 — r as x^ < r for i ^ S and 
Xj > 1 — r for i G S. Now using conditional probability and |/(T)| < M for any T, 

/(x) = Ei....[o,i][/({i : Xi > f})] 

= (1 — 2r) E[/({i : Xj > t})|r < t < 1 — r] + 

r (E[/({i : Xj > f})|0 <t <r] + E[/({i : Xj > t})|l - r < t < 1]]) 

= (1 - r) /(S) + r (E[/({i : Xi > t})\0 < t < r + E[/({i : Xj > t})|l - r < t < 1]]) 

< (1 - 2r) /(S) + 2rM 

< /(S) + 4rM 

< f{S) + l 

But now P C p(^) as /(x) < /(S) + 1 < /(S;) and by (2) of the last lemma. This would lead to a 
contradiction since 

vol(P) = ^ > vol(P^*^^) 

^ ’ [QMY ~ ^ ’ 

for sufficiently large K. □ 
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Corollary 70. There is an 0{n‘^ log M ■ EO + n^^^') logM) time algorithm for submodular function 
minimization. 

Proof. This simply follows from the last lemma, Theorem 67, and the fact that our separation 
oracle runs in time 0{n ■ EO + n^). □ 

Curiously, we obtained O(logM) rather than O(lognM) as in our algorithm. We leave it as an 
open problem whether one can slightly improve our running time to 0(n^ log M ■ EO + log*^*-^^ n ■ 

logM). The rest of this paper is devoted to obtaining better strongly polynomial running time. 

15 Improved Strongly Polynomial Algorithms for SFM 

In this section we show how our cutting plane method can be used to obtain a 0{n^ ■ EO + n^) 
time algorithm for SFM, which improves over the currently fastest 0(n^ • EO + n®) time algorithm 
by Orlin. 

15.1 Improved Oracle Complexity 

We first present a simple geometric argument that / can be minimized with just 0{n^logn ■ 
EO) oracle calls. While this is our desired query complexity (and it improves upon the previous 
best known bounds by a factor of O(n^) unfortunately the algorithm runs in exponential time. 
Nevertheless, it does provide some insight into how our more efficient algorithms should proceed 
and it alone, does suggests that information theoretically, 0{n^ logn-EO) calls suffice to solve SFM. 
In the rest of the paper, we combine this insight with some of the existing SFM tools developed 
over the last decade to get improved polynomial time algorithms. 

Theorem 71. Submodular functions can be minimized with 0(n^ log n • EO) oracle calls. 

Proof. We use the cutting plane method in Theorem 67 with the separation oracle given by Theo¬ 
rem 61. This method reduce the volume of the feasible region by a factor of (|)^ after k iterations 
if the optimal has not found yet. 

Now, we argue that after 0{n log n) iterations of this procedure we have either found a minimizer 
of / or we have enough information to reduce the dimension of the problem by 1. To see this, first 
note that if the separation oracle ever returns a degenerate hyperplane, then by Lemma 65 then 
either 0 or E is the minimizer, which we can determine in time 0(E0 -|- n). Otherwise, after 
lOOnlogn iterations, our feasible region P must have a volume of at most 1/re^®” . In this case, we 
claim that the remaining integer points in P all lie on a hyperplane. This holds, as if this was not 
the case, then there is a simplex A, with integral vertices uq, ui,..., Vn, contained in P. But then 

1 1 

VOI(P) > VOI(A) = — |det (Ui -VqV 2 -Vo ... - uq)! > -: 

n! n\ 

where the last inequality holds since the determinant of an integral matrix is integral, yielding a 
contradiction. 

In other words after 0(n log n) iterations, we have reduced the dimension of all viable solutions 
by at least 1. Thus, we can recurse by applying the cutting plane method to the lower dimensional 
feasible region, i.e. P is (replaced by) the convex combination of all the remaining integer points. 
There is a minor technical issue we need to address as our space is now lower dimensional and the 
starting region is not necessarily the hypercube anymore and the starting volume is not necessarily 
equal to 1. 
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We argue that the starting volume is bounded by If this is indeed the case, then our 

previous argument still works as the volume goes down by a factor of 1 /in 0{n log n) iterations. 

Let n G P be an integer point. Now the dim(P)-dimensional ball of radius ^/n centered at v 
must contain all the other integer points in P as any two points of {0, are at most ^/n apart. 

Thus the volume of P is bounded by the volume of the ball which is Now to get the volume 

down to 1 /n^*^”, the number of iterations is still 0 (n logn). 

In summary, we have reduced our dimension by 1 using 0(n log n) iterations which requires 
0(n^ logn • EO) oracle calls (as each separating hyperplane is computed with n ■ EO oracle calls). 
This can happen at most n times. The overall query complexity is then 0{n^ logn • EO). 

Note that the minimizer x obtained may not be integral. This is not a problem as the definition 
of Lovasz extension implies that if f{x) is minimal, then /({i : Xi > t}) is minimal for any t G [0,1]. 

We remark that this algorithm does not have a polynomial runtime. Even though all the integral 
vertices of P lie on a hyperplane, the best way we know of that identifies it takes exponential time 
by checking for all the integer points { 0 , 1 }”. □ 

Remark 72. Note that this algorithm works for minimizing any convex function over the hypercube 
that obtains its optimal value at a vertex of the hypercube. Formally, our proof of Theorem 71 
holds whenever a function / : 2^ —> IR" admits a convex relaxation / with the following properties: 

1. For every S CV, f{Is) = f{S). 

2. Every f{x) can be written as a convex combination Yls&s where ^05 = 1, |5| = 

0{n), and S can be computed without any oracle call. 

3. A subgradient df(x) of / at any point x G [0,1]” can be computed with 0{n ■ EO) oracle 
calls. 

In this case, the proof of Theorem 71, implies that / and / can be minimized with 0(n^ logn • EO) 
oracle calls by using the separating hyperplane df{x)'^{x — x) < 0 . 

15.2 Technical Tools 

To improve upon the running time of the algorithm in the previous section, we use more structure 
of our submodular function /. Rather than merely showing that we can decrease the dimension of 
our SFM problem by 1 we show how we can reduce the degrees of freedom of our problem in a more 
principled way. In Section 15.2.1 we formally define the abstraction we use for this and discuss how 
to change our separation oracle to accommodate this abstraction, and in Section 15.2.2 we show 
how we can deduce these constraints. These tools serve as the foundation for the faster strongly 
polynomial time SFM algorithms we present in Section 15.3 and Section 15.4. 

15.2.1 SFM over Ring Family 

For the remainder of the paper we consider a more general problem than SFM in which we wish to 
compute a minimizer of our submodular function / over a ring family of the ground set V =[n]. A 
ring family T" is a collection of subsets of V such that for any Si, ^2 G P, we have S 1 US 2 , SinS 2 G P. 
Thus SFM corresponds to the special case where P consists of every subset of V. This generalization 
has been considered before in the literature and was essential to the IFF algorithm. 

It is well known that any ring family P over V can be represented by a directed graph D = (F, A) 
where S £ P iS S contains all of the descendants of any i £ S. An equivalent definition is that for 
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any arc (i, j) G A, i £ S implies j G S. It is customary to assume that A is acyclic as any (directed) 
cycle of A can be contracted (see section 15.3.1). 

We denote by R{i) the set of descendants of i (inclnding i itself) and Q{i) the set of ancestors 
of i (including i itself). Polyhedrally, an arc {i,j) G A can be encoded as the constraint Xi < Xj as 
shown by the next lemma. 

Lemma 73. Let J- he a ring family over V and D = {V, A) be its directed acyclic graph represen¬ 
tation. Suppose f : V —)• IR is submodular with Lovdsz extension f. Then the characteristic vector 
Is of any minimizer S = arg min 5 g_ 7 r/(5) over F is also the solution to 

mmf(x) 

Xi < Xjy{i,j) G A (15.1) 

0 < f < 1 

Proof. Let x* be a minimizer, and = {i : xt > t}. It is easy to check that the indicator 
variable /^(t) satisfies (15.1) since x* does. Moreover, recall that f{x*) = [/(Lj)]. Thus 

f{x*) can be written as a convex combination f{x*) = ~ where at > 0 

and ~ Thus all such f{Ij^{t)) are minimal, i.e. (15.1) has no “integrality gap”. □ 

We also modify onr separation oracle to accommodate for this generalization as follows. Before 
doing so we need a definition which relates our BFS to the ring family formalism. 

Definition 74. A permutation [vi,... ,Vn) of V is said to be consistent with an arc {i,j) if j 
precedes i in {vi,... ,Vn)- Similarly, a BFS of the base polyhedron is consistent with (i,j) if j 
precedes i in its defining permntation. (ui,..., Vn) (or a BFS) is consistent with A if it is consistent 
with every {i,j) G A. 

Readers may find it helpful to keep in mind the following picture which depicts the relative 
positions between R(i),i,Q{i) in the defining permutation of h that is consistent with A: 

..^.. 

In Theorem 61, given x G [0,1]” our separating hyperplane is constructed by sorting the entries 
of X. This hyperplane is associated with some BFS h of the base polyhedron. As we shall see 
towards the end of the section, we would like h to be consistent with every arc {i,j) G A. 

This task is easy initially as x satisfies Xi < xj for (i,j) G A for the starting polytope of (15.1). 
If Xi < Xj, nothing special has to be done as j must precede i in the ordering. On the other hand, 
whenever Xi = Xj, we can always break ties by ranking j ahead of i. 

However, a technical issue arises due to the fact that our cutting plane algorithm may drop 
constraints from the current feasible region P. In other words, x may violate x* > 0, xj < 1 or 
Xj < Xj if it is ever dropped. Fortunately this can be fixed by reintroducing the constraint. We 
summarize the modification needed in the pseudocode below and formally show that it fnlfills our 
requirement. 

Lemma 75. Our modified separation oracle returns either some BFS h = 0 or a valid separating 
hyperplane, i.e. 

1. X either lies on the separating hyperplane or is cut away by it. 

2. Any minimizer of (15.1) is not cut away by the separating hyperplane. 
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Algorithm 5: Modified Separation Oracle 
Input: X G IR” and the set of arcs A 
if Xi < 0 for some i then 
I Output: Xi> 0 
else if Xj > 1 for some j then 
I Output: Xj < 1 

else if Xi > Xj for some {i,j) G A then 
I Output: Xi < Xj 
else 

Let ii,... An be a permutation of V such that > ... > Xj^and for all {i,j) G A, j 
precedes i in ii, ... An- 

Output: h^x < /(x), where h is the BFS defined by the permutation ii,..., in- 


Such a hyperplane can be computed with n oracle calls to f and in time 0{n ■ E0 + n^). 

Proof. If we get Xj > 0, Xj < 1 or Xj < Xj (if loop or the hrst two else loops), then clearly x is cut 
away by it and any minimizer must of course satisfy x* > 0, Xj < 1 and Xj < Xj as they are the 
constraints in (15.1). This proves (1) and (2) for the case of getting Xj > 0, Xj < 1 or Xj < Xj. 

Thus it remains to consider the case Ir x < f{x) (last else loop). First of all, x lies on it as 
/(x) = Wx. This proves (1). For (2), we have from Lemma 63 that x < /(x). If x* is a 
minimizer of (15.1), we must then have Ir x* < f{x*) < f{x) as x is also feasible for (15.1). 

Finally we note that the running time is self-evident. □ 

We stress again that the main purpose of modifying our separation oracle is to ensure that any 
BFS h used to dehne a new separating hyperplane must be consistent with every (i, j) G A. 

15.2.2 Ideutifyiug New Valid Arcs 

The reason for considering the ring family generalization of SFM is that our algorithms (and some 
previous algorithms too) work by adding new arcs to our digraph D. This operation yields a 
strongly polynomial algorithm since there are only 2 • ( 2 ) possible arcs to add. Of course, a new arc 
(i, j) is valid only if f G 5min j £ 5'min for some minimizer 5'min- Here we show how to identify 
such valid arcs by extracting information from certain nice elements of the base polyhedron. 

This is guaranteed by the next four lemmas, which are stated in a way different from previous 
works e.g. our version is extended to the ring family setting. This is necessary as our algorithms 
require a more general formulation. We also give a new polyhedral proof, which is mildly simpler 
than the previous combinatorial proof. On the other hand. Lemma 80 is new and unique to our 
work. It is an important ingredient of our 0{n^ • EO -|- n^) time algorithm. 

Recall that each BFS of the base polyhedron is defined by some permutation of the ground set 
elements. 

First, we prove the following two lemmas which show that should we ever encounter a non¬ 
degenerate point in the base polytope with a coordinate of very large value, then we can immediately 
conclude that that coordinate must be or must not be in solution to SFM over the ring family. 

Lemma 76. If y G B{f) is non-degenerate and satisfies yi > — (n — 1) minj yj, then i is not in any 
minimizer of f (over the ring family A). 
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Proof. We proceed by contradiction and suppose that S' is a minimizer of / that contains i. Now 
since y is non-degenerate we know that minj yj < 0 and by the definition of y we have the following 
contradiction 

0<yi + {n- l)mmyj < Vyj = y{S) < f{S) < /( 0 ) = 0. 


Lemma 77. If y G B{f) is non-degenerate and satisfies yi < —(n — 1) maxj yj, then i is in every 
minimizer of f (over the ring family A). 

Proof. We proceed by contradiction and suppose that S is a minimizer of / that does not contain 
i. Now since y is non-degenerate we know that maxj yj > 0 and therefore 

y] yj = 2 /i + y' Vj + y yj < -in - l) maxy^ + V yj + (|1^| - |S| - 1) maxy^ < V ?/j . 
je[n] jes jev-is+i) ^ jes ^ j&s 

However by the definition of y we have 

J2yj=y{S)<f{S)<f{V)= y y,. 

j&S je[n] 

Thus we have a contradiction and the result follows. □ 

Now we are ready to present conditions under which a new valid arc can be added. We begin 
with a simple observation. Let upper(z) = f{R{i)) — f{R{i) — i) and lower(z) = f{V\Q{i) + i) — 
f{V\Q{i)). As the names suggest, they bound the value of hi for any BFS used. 

Lemma 78. For any BFS h used to construct a separating hyperplane given by our modified 
separation oracle, we have lower(i) < hi < upper(i). 

Proof. Note that by Lemma 75, h is consistent with every (ji,j 2 ) G A and hence i must precede 
Q{i) and be preceded by R{i). Let S be the set of elements preceding i in the defining permutation 
of h. Then hi = f{S -|- i) — f{S) < f{R{i)) — f{R{i) — i) because of diminishing return and 
R[i) — i C S'. The lower bound follows from the same argument as Q{i) — i comes after i, and so 
Q{f) C V\S. □ 

In the following two lemmas, we show that if upper(z) is ever sufficiently positive or lower(i) 
is sufficiently negative, then we find a new arc. 

While these lemmas may appear somewhat technical but actually has an intuitive interpretation. 
Suppose an element p is in a minimizer Smin of / over the ring family D. Then R{p) must also be 
part of S min . Now if f{R{p)) is very large relative to f{R{p) — p), there should be some element 
q G Smm\R{p) compensating for the discrepancy. The lemma says that such an element q can in 
fact be found efficiently. 

Lemma 79 (new arc). Let y = Ylk ® non-degenerate convex combination of 0{n) base 

polyhedron BFS’s y^^'^ which are consistent with every arc {i,j) G A. If some element p satisfies 
upper(p) > n^maxyj, then we can find, using 0{n-E0) oracle calls and 0{n^) time, some q ^ R{p) 
such that the arc {p,q) is valid, i.e. if p is in a minimizer, then so is q. 
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Proof. If max yj < 0 then we are immediately done by Lemma 65. We assume maxyj > 0 in 
the proof. For all k let be the BFS obtained by taking the defining permutation of and 
moving R{p) to the front while preserving the relative ordering of R{p) within each permutation). 
Furthermore, let if = Ylk Then since y'p^^ = f{R{p)) — f{R{p) — p) = upper(p) we have 

upper(p) =yp = f{R{p)) - f{R{p) -p)- Moreover, 

y'j > Vj Vj G R{p) and y' < yj Vj f R{p) (15.2) 

by diminishing marginal return. 

Now, suppose p is in a minimizer 5min- Then R{p) C S'min by definition. We then define 
f'{S) = f{S U R{p)) for S C V\R{p). It can be checked readily that f is submodular and 
Smin\R{p) is a minimizer of /' (over the corresponding ring family). Note that now ify^jK^p-^ (the 
restriction of if to V\R{p)) is a convex combination of the BFS’s of the base polyhedron B{f') of 
f. We shall show that has the desired property in Lemma 77. 

Note that y'{V\R{p) +p) < y{V\R{p) +p) since 

y{V\R{p)+p) = y'{V)-y'{R{p)-p) = y{V)-y'{R{p)-p) < y{V) - y{R{p) -p) = y{V\R{p)+p). 
But now since y is non-degenerate maxj yj > 0 and therefore 

y'{V\R{p)) < y{V\R{p) + P) - y'p 

= y{V\R{p) + p) - (/(i?(p)) - f{R{p) - p)) (15.3) 

< nmaxyj - {f{R{p)) - f{R{p) -p)) 

< {n — n‘^)maxyj 

Therefore by the Pigeonhole Principle some q f R{p) must satisfy 

y'q < ((n - n‘^)maxyj)/(n - 1) 

= — (n^ + vf + n) max yj 

< —irf' + vf -\-n) max y,- 

mip) 

< —{inf + 10 ? + n) max y' by (15.2) 

3iR{p) 

By Lemma 77, this q must be in any minimizer of f. In other words, whenever p is in a minimizer 
of /, then so is q. 

Note however that computing all if would take O(n^) oracle calls in the worst case as there are 
0{n) We use the following trick to identify some q with y^ < —(n — l)maxyj using just 

0{n) calls. The idea is that we actually only want to have sufficient decreases in y'[V\R{p)) which 
can be accomplished by having a large corresponding decrease in some if^^\ 

For each k, by the same argument above (see (15.3)) 

y'(")(n^(p)) - y^’^Hv\Rip)) < yf - {f{R{p)) - f{R{p) - p)) (15.4) 

The “weighted decrease” ^y^^^ — {f{R{p)) — f{R{p) — p))J for if^^') sum up to 

- if{R{p)) - f{R{p) - p))) =yp- {f{R{p)) - f{R{p) - p)) < (1 - n^) maxy^- 
Thus by the Pigeonhole Principle, some I will have 

{yf-{f{R{p))-f{R{p)-p))) < ((1-n"^) maxyj)/0(n) <maxyp 
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For this we compute We show that xi^)yi^) has the same property 

as if above. 


y"{V\R{p)) = 


< 

< 

< 


k^l 

y{V\R{p)) + A« [y'^^\v\Rip)) - y«(F\i?(p))) 
y{V\R{p)) + AW (yf - U{R{p)) - f{R{p) - p))) 

(n — 1) maxyj — maxy^ 

{n — vf) maxyj 


by (15.4) 


Then some q G V\R{p) must satisfy 

„ n — v? 

y„ < -maxww = —nmaxy,- 

^ n — 1 

That is, the arc {p, q) is valid. This takes 0{n) oracle calls as given y = , computing 

if' requires knowing only f{R{p)), f{R{p) — p), and if^^^ which can be computed from with n 
oracle calls. The runtime is 0{n‘^) which is needed for computing if'. □ 

Lemma 80. Let y = Yk a non-degenerate convex combination of base polyhedron BPS 

if ^') which is consistent with every arc {i,j) G A. //lower(p) < n^minyj, then we can find, using 
0{n ■ EO) oracle calls and Oin?) time, some q f Q{p) such that the arc {q,p) is valid, i.e. if p is 
not in a minimizer, then q is not either. 


Proof. It is possible to follow the same recipe in the proof of Lemma 79 but using Lemma 76 instead 
of Lemma 77. Here we offer a proof which directly invokes Lemma 77 on a different submodular 
function. 

Let g be defined by g{S) = f{V\S) for any S, and Ag be the set of arcs obtained by reversing 
the directions of the arcs of A. Consider the problem of minimizing g over the ring family Ag. 
Using subscripts to avoid confusion with / and g, e.g. Rg{i) is the set of descendants of i w.r.t. 
Ag, it is not hard to verify the following: 

• y is submodular 

• Rg{{) = Qf{i) 

• 9{Rg{p)) - 9{Rg{p) - P) = -{fiVXQfip) + P) - fiV\Qf{p))) 

• —y(^) is a BFS of B{g) if and only if is a BFS of B{f) 

• max(—yj) = — minyj 

By using the above correspondence and applying Lemma 79 to g and Ag, we can find, using 0{n) 
oracle calls and 0{n^) time, some q f Rg{p) = Q{p) such that the arc {p,q) is valid for g and Ag. 
In other words, the reverse {q,p) will be valid for / and A. □ 


These lemmas lay the foundation of our algorithm. They suggests that if the positive entries 
of a point in the base polyhedron are small relative to some upper(p) = f{R{p)) — f{R{p) — p), a 
new arc {p, q) can be added to A. This can be seen as a robust version of Lemma 65. 

Finally, we end the section with a technical lemma that will be used crucially for both of our 
algorithms. The importance of it would become obvious when it is invoked in our analyses. 
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Lemma 81. Let h" denote a convex combination of two vectors h and h' in the base polyhedron, 
i.e. h" = A/i + (1 — X)h' for some A G [0,1]. Further suppose that 

||^^^||2 ^ a min |-^||^|| 2 ) (1 “ "^)||^^|| 2 } 

for some a < 2 ^- Then for p = argmaxj(max{A|/ij|, (1 — A)|/i' |}) we have 


lower(p) < — 


—^ • Ill'll 

2a^/n “ 


and upper(p) > 


^7= • Ill'll 

2ay/n “ 


Proof. Suppose without loss of generality that X\hp\ > (1 — X)\hp\. Then by assumptions we have 

< 11^^112 — ^ |a||/i|| 2 , (1 - A)||h'|| 2 | < aVn |A/ip| 


However, since a < 77 ^ we see that 
’ — 2 ^ 


1 


\Xhp + (1 — X)hp\ < < ay/n\Xhp\ < - \Xhp\ 

Consequently, Xhp and (1 — X)hp have opposite signs and |(1 — X)hp\ > ^ |A/ip|. We then have. 


and 


lower(p) < min\hp,h'} < minjA/ip, (1 — X)h'} < — \Xhp\ < - 

FJ Li' FJ 2 2a^fn" 

upper(p) > max {hp,hp} > max {Xhp, {1 - X)hp} > ^\Xhp\ > 


□ 


15.3 0(n^ ■ EO + n^) Time Algorithm 

Here we present a 0(re^ • EO + n^) time, i.e. strongly polynomial time algorithm, for SFM. We 
build upon the algorithm achieved in the section to achieve a faster running time in Section 15.4. 

Our new algorithm combines the existing tools for SFM developed over the last decade with 
our cutting plane method. While there are certain similarities with previous algorithms (especially 
[54, 60, 56]), our approach significantly departs from all the old approaches in one important aspect. 

All of the previous algorithms actively maintain a point in the base polyhedron and represent 
it as a convex combination of BFS’s. At each step, a new BFS may enter the convex combination 
and an old BFS may exit. Our algorithm, on the other hand, maintains only a collection of BFS’s 
(corresponding to our separating hyperplanes), rather than an explicit convex combination. A 
“good” convex combination is computed from the collection of BFS’s only after running Cutting 
Plane for enough iterations. We believe that this crucial difference is the fundamental reason which 
offers the speedup. This is achieved by the Cutting Plane method which considers the geometry 
of the collection of BFS’s. On the other hand, considering only a convex combination of BFS’s 
effectively narrows our sight to only one point in the base polyhedron. 

Overview 
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Now we are ready to describe our strongly polynomial time algorithm. Similar to the weakly poly¬ 
nomial algorithm, we first run our cutting plane for enough iterations on the initial feasible region 
{x G [0,1]” : Xi < Xj y{i,j) G A}, after which a pair of approximately parallel supporting hyper¬ 
planes Fi,F 2 of width 1 /rfiA) ^^n be found. Our strategy is to write Fi and F 2 as a nonnegative 
combination of the facets of remaining feasible region P. This combination is made up of newly 
added separating hyperplanes as well as the inequalities x* > 0, Xj < 1 and Xi < xj. We then argue 
that one of the following updates can be done: 

• Collapsing: Xi = 0, Xj = 1 ov Xi = Xj 

• Adding a new arc (i,j): Xi < xj for some (i, j) ^ A 

The former case is easy to handle by elimination or contraction. If Xi = 0, we simply eliminate i 
from the ground set V; and if Xi = 1, we redefine / so that f{S) = f{S + i) for any S C V — i. 
Xi = Xj can be handled in a similar fashion. In the latter case, we simply add the arc (i,j) to A. 
We then repeat the same procedure on the new problem. 

Roughly speaking, our strongly polynomial time guarantee follows as eliminations and contrac¬ 
tions can happen at most n times and at most 2 • ( 2 ) arcs can be added. While the whole 
picture is simple, numerous technical details come into play in the execution. We advise readers to 
keep this overview in mind when reading the subsequent sections. 

Algorithm 

Our algorithm is summarized below. Again, we remark that our algorithm simply uses Theorem 82 
regarding our cutting plane and is agnostic as to how the cutting plane works, thus it could be 
replaced with other methods, albeit at the expense of slower runtime. 

1. Run cutting plane on (15.1) (Theorem 82 with r = 0(1)) using our modified separation oracle 
(Section 15.2.1). 

2. Identify a pair of “narrow” approximately parallel supporting hyperplanes or get some BFS 
h = 0 (in which case both 0 and V are minimizers). 

3. Deduce from the hyperplanes some new constraint of the forms Xi = 0,Xj = l,Xj = Xj or 
Xi < Xj (Section 15.3.2). 

4. Consolidate A and / (Section 15.3.1). 

5. Repeat by running our cutting plane method on (15.1) with updated A and /. (Note that 
Any previously found separating hyperplanes are discarded.) 

We call step (1) a phase of cutting plane. The minimizer can be constructed by unraveling the 
recursion. 

15.3.1 Consolidating A and / 

Here we detail how the set of valid arcs A and submodular function / should be updated once we 
deduce new information Xi = 0,Xi = l,Xi = Xj or Xi < Xj. Recall that R{i) and Q{i) are the sets 
of descendants and ancestors of i respectively (including i itself). The changes below are somewhat 
self-evident, and are actually used in some of the previous algorithms so we only sketch how they 
are done without a detailed justification. 

Changes to the digraph representation D of our ring family include: 
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• Xi = 0: remove Q{i) from the ground set and all the arcs incident to Q{i) 

• Xi = 1: remove R{i) from the ground set and all the arcs incident to R{i) 

• Xi = Xj-. contract i and j in D and remove any duplicate arcs 

• Xi < Xj: insert the arc (i,j) to A 

• For the last two cases, we also contract the vertices on a directed cycle of A until there is no 
more. Remove any duplicate arcs. 

Here we can contract any cycle (zi,..., ik) because the inequalities xq < Xi^,..., Xi^_^ < Xj^., x^ < 
Xji imply xq = ... = Xj^^. 

Changes to /: 

• Xi = 0: replace / by /' : ^ R, f'{S) = f{S) for S C V\Q{i) 

• Xi = 1: replace / by /' : ^ R, f{S) = f{S U R{i)) for S C V\R{i) 

• Xi = Xj: see below 

• Xi < Xj: no changes to / needed if it does not create a cycle in A; otherwise see below 

• Contraction of C = {ii,... ,ik}: replace / by /' : —> R, f'{S) = f{S) for S C V\C 

and f{S) = f{{S -l)UC) ior S3l 

Strictly speaking, these changes are in fact not needed as they will automatically be taken care of 
by our cutting plane method. Nevertheless, performing them lends a more natural formulation of 
the algorithm and simplifies its description. 

15.3.2 Deducing New Constraints Xi = 0, Xj = 1, Xi = Xj or Xi < Xj 

Here we show how to deduce new constraints through the result of our cutting plane method. This 
is the most important ingredient of our algorithm. As mentioned before, similar arguments were 
used first by IFF [56] and later in [54, 60]. There are however two important differences for our 
method: 

• We maintain a collection of BFS’s rather a convex combination; a convex combination is 
computed and needed only after each phase of cutting plane. 

• As a result, our results are proved mostly geometrically whereas the previous ones were proved 
mostly combinatorially. 

Our ability to deduce such information hinges on the power of the cutting plane method in Part I. 
We re-state our main result Theorem 31 in the language of SFM. Note that Theorem 82 is formulated 
in a fairly general manner in order to accommodate for the next section. Readers may wish to think 
r = 0(1) for now. 

Theorem 82 (Theorem 31 restated for SFM). For any r > 100, applying our cutting plane method, 
Theorem 82, to (15.1) with our modified separation oracle (or its variant in Section 15.4) with high 
probability in n either 

1. Finds a degenerate BFS h > 0 or h < 0. 
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2. Finds a polytope P consisting of 0(n) constraints which are our separating hyperplanes or 
the constraints in (15.1). Moreover, P satisfies the following inequalities 

(FX < M and (f^x < M', 


both of which are nonnegative combinations of the constraints of P, where ||c + c ^||2 < 
min{||c|| 2 , ||c?'|| 2 }/n®('^) and \M + M'\ < min{||cl| 2 , || 5 '|| 2 }/n®(’'). 

Furthermore, the algorithm runs in expeeted time 0(n^r log n • EO + n). 

Proof. In applying Theorem 82 we let K be the set of minimizers of / over the ring family and 
the box is the hypercube with R = 1. We run cutting plane with our modified separation oracle 
(Lemma 75). The initial polytope can be chosen to be, say, the hypercube. If some separating 
hyperplane is degenerate, then we have the desired result (and know that either 0 or P is optimal). 
Otherwise let P be the current feasible region. Note that P 7 ^ 0, because our minimizers of / are 
all in p(°) and P^^^ as they are never cut away by the separating hyperplanes. 

Let S be the collection of inequalities (15.1) as well as the separating hyperplanes IPx < 
f{xh) = Xfi used. By Theorem 31, all of our minimizers will be contained in P, consisting of 
0(n) constraints Ax > b. Each such constraint af x > bi is a scaling and shifting of some inequality 
p[x>qi in S, i.e. Si = Pi/\\pi\\2 and bi < qi/\\pi\\2- 

By taking e = l/n®^'^) with sufficiently large constant in 0, our theorem certifies that P 
has a narrow width by oi, some nonnegative combination and point Xq £ P with 

ll^olloo < 2>y/nR = ‘i^/n satisying the following: 

0(n) 

a\ T ^ ^ tiOi 
i=2 

0 < di Xo — bi < 

/ 0{n) \ ^ 0(n) 

0 < I ^ tiUi I Xo-'^ tibi < 1/n®^^^ 

\ i=2 J i=2 

We convert these inequalities to p and q. Let t' '= ti • ||pi|I 2 /I|Pi|I 2 > 0. 


2 


0{n) 

Pl+Yl 

i=2 


2 


0 <PiXo- qi < 

/0(n) \ 0(n) 

0 < ^ Xo-'^ t'iqi < 

\ i=2 j i=2 

We claim that® c = —pi, M = —qi, F = — Yf= 2 ^ ^iPi^ — ~ satisfy our require¬ 

ment. 

®Minus signs is needed because we express our inequalities as e.g. Ji x < n Xh whereas in Theorem 31, a[x > bi 
is used. We apologize for the inconvenience. 
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We first show that ||c + c ?'||2 < niin{||c|| 2 , We have ||c + c ?||2 < from 

the first inequality. If ||c ||2 < ||^||2 we are done. Otherwise, by triangle inequality 

ll^lb-PI 2 < ||c + ^||2 < P|2/n®(") ^ 2P|2 >||c'||2 

and hence ||c + c '||2 < \\(^\ 2 /< 11^|| 2 / 2 n®(^) = \ \S\\ 2 /n^^'^'^■ 

We also need to prove \M + M'\ < min{||c|| 2 , ||c?|| 2 }/?t-®*'^^- Summing the second and third 
inequalities, 

-P| 2 /n®(^) < (c + c')^Xo - (M + M') < 0 
Recall that we have ||xo||oo < 2>y/n. Then 

\M + M'\ < \{c + ^)'^Xo-{M+ M')\ + \{c + ^fxo\ 

< P|2/n®(")+3Vn||c + c^||2 

< ||c1|2/n®«+3^^i||c1|2/n®(") 

= I|c1|2/n®« 

as desired. Our result then follows as we proved 2||c'||2 > ||c|| 2 - 

Finally, we have the desired runtime as our modified separation oracle runs in time 0{n ■ EO + 
log*^^^^ n). □ 

Informally, the theorem above simply states that after 0(rer log n) iterations of cutting plane, 
the remaining feasible region P can be sandwiched between two approximately parallel supporting 
hyperplanes of width . A good intuition to keep in mind is that every 0(n) iterations of 

cutting plane reduces the minimum width by a constant factor. 

Remark 83. As shown in the proof of Theorem 82, one of the two approximately parallel hyperplanes 
can actually be chosen to be a constraint of our feasible region P. However we do not exploit this 
property as it does not seem to help us and would break the notational symmetry in c and c?. 

Setup 

In each phase, we run cutting plane using Theorem 82 with r = 0(1). If some separating hyperplane 
used is degenerate, we have found the minimizer by Lemma 65. 

Now assume none of the separating hyperplanes is degenerate. By Theorem 82, P is sandwiched 
by a pair of approximately parallel supporting hyperplanes F, F' which are of width 1/lOn^^ apart. 
The width here can actually be l/iF for any constant c by taking a sufficiently large constant in 
Theta. 

Here, we show how to deduce from F and F' some xi = 0,, Xj = l,Xj = Xj, or xi < Xj constraint 
on the minimizers of / over the ring family. Let 

c^x = Y, CiXi < M and ^ c'iXi < M' 

be the inequality for F and F' such that 

|M + M'|, ||c +c '||2 < gap, where gap = y^^min{||cl| 2 ,||c'|| 2 }. 

By the same theorem we can write < M as a nonnegative combination of the constraints 
for P. Recall that the constraints for P take on four different forms: (1) —Xi < 0; (2) Xj < 1; (3) 
— {xj — Xi) < 0; (4) IPX = X] hiXi < f{xh)- Here the first three types are present initially whereas 
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the last type is the separating hyperplane added. As alleged previously, the coefficient vector h 
corresponds to a BFS of the base polyhedron for /. Our analysis crucially exploits this property. 

Thus suppose CjXj < M is a nonnegative combination of our constraints with weights 

oti: > 0. The number of (positive) at, j3j,'yij, Xh is at most 0{n). Here we denote sep¬ 

arating hyperplanes by fr x < f{xh). Let H be the set of BFS’s used to construct separating 
hyper planes. 


f'x =-'^aiXi + '^l3jXj + ^ -iij{xi-xj) + '^\hh^x and M ='^/3j +'^ Xhfixh). 

i j heH j h&H 

(15.5) 

Similarly, we write the inequality for F' as a nonnegative combination of the constraints for P 
and the number of (positive) a',/3', 7 L, is 0{n): 

= - ^ a'iXi ^ PjXj + ^ 7 L {xi-Xj)+'^ X'l^h^x and Af' = ^ /?'+ ^ Kfi^h)■ 

(i,j)eA h&H h&H 

(15.6) 

We also scale c, c', a, a', /3, /3', 7 , 7 ', A, A' so that 

+ A)j) = 1 

h&H 

as this does not change any of our preceding inequalities regarding F and F'. 

Now that F, F' have been written as combinations of our constraints, we have gathered the 
necessary ingredients to derive our new arc. We first give a geometric intuition why we would 
expect to be able to derive a new constraint. Consider the nonnegative combination making up 
F. We think of the coefficient Pj as the contribution of Xj < 1 to F. Now if /3j is very large, F is 
“very parallel” to Xj < 1 and consequently F' would miss Xj = 0 as the gap between F and F' is 
small. P would then miss Xj = 0 too as it is sandwiched between F and F'. Similarly, a large ai 
and a large 7 ij would respectively imply that Xj = 1 and (xj = 0, Xj = 1) would be missed. The 
same argument works for F' as well. 

But on the other hand, if the contributions from Xj > 0,Xj < l,Xj < Xj to both F and F' 
are small, then the supporting hyperplanes c*^x < ... and cf'^x < ... would be mostly made up 
of separating hyperplanes IFx < f{xh)- By summing up these separating hyperplanes (whose 
coefficients form BFS’s), we would then get a point in the base polyhedron which is very close 
to the origin 0. Moreover, by Lemma 81 and Lemma 79 we should then be able to deduce some 
interesting information about the minimizer of / over D. 

The rest of this section is devoted to realizing the vision sketched above. We stress that while 
the algebraic manipulations may be long, they are simply the execution of this elegant geometric 
picture. 

Now, consider the following weighted sum of hr x < f{xh): 

I ^ Xhh^ + X'f,hY ^ < Y >^hfixh) + Y ^hf{xh)- 

\heH heH J heH heH heH heH 
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Observe that ^hh'^+'^h&H ™ base polyhedron since it is a convex combination 

of BFS h. Furthermore, using (15.5) and (15.6) this can also be written as 


\h£H heH 


^ ' X = { X 




+ I c'^x + a[xi - Y l^j^j + Y1 

(*j)6A 


and 


Y ^hfixh) + Y ^hfixh) = (m -Y^l) + ~ '^^'3 


heH 


heH 


{M + M')-Yl33-Yf^3 


(15.7) 


Furthermore, we can bound x + ^'^x by cFx + S'^x > —||c + c*'||i > —^/n\\c + ^\\2 > —-^/ngap 
as X < 1. Since M + M' < gap, we obtain 

LHS = Y (^3^3 ~ -Xi)+ Y ~ 

(*d)eA 


< 2 Vngap -Y^i~Yl ^'3 

Geometrically, the next lemma states that if the contribution from, say x, > 0, to F" is too 
large, then F' would be forced to miss Xi = 1 because they are close to one another. 

Lemma 84. Suppose x satisfies (15.1) and LHS < 2\/ngap—^/3j—^/3'- with ai, I3j,'yij,a[, > 

0 . 


1. If ai> 2-y/ngap or a' > 2y^gap, then Xi <1. 

2. If fij > 2-y/ngap or /3' > 2y^gap, then xj >0. 

3. If Jij > 2yhigap or 7 b > 2y^gap, then 0 < Xj — Xi < 1. 

Proof. We only prove it for a,, as the other case follows by symmetry. 

Using 0 < X < 1 and Xi < xj for (z,j) G A, we have LHS > aiXi — l^j ~ Hence 

otiXi < 2^/nga.■p and we get Xj < 1 if a* > 2y^gap. 

Similarly, LHS > -fikXk - ~ Yf^'j which gives -fikXk < 2i/ngap - fik- Then Xfc > 0 

if fik > 2Vngap. 

Finally, LHS > jij {xj — Xi) — Y Pj ~Y P'j which gives yjj {xj — Xj) < 2y^gap. Then Xj — Xj < 1 
if ^ij > 2^/nga.■p. We have Xi < Xj since {i,j) G A. □ 


So if either condition of Lemma 84 holds, we can set Xj = 0 or Xj = 1 or x^ = xj since our 
problem (15.1) has an integral minimizer and any minimizer of / is never cut away by Lemma 75. 
Consequently, in this case we can reduce the dimension by at least 1. From now on we may assume 
that 

max{ai,a',/3j,/3', 7 *^, 7 b} < 2Vngap. (15.8) 

Geometrically, (15.8) says that if the supporting hyperplanes are both mostly made up of the 
separating hyperplanes, then their aggregate contributions to F and F' should be small in absolute 
value. 

The next lemma identifies some p gV for which f{R{p)) — f{R{p) — p) is “big”. This prepares 
for the final step of our approach which invokes Lemma 79. 
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Lemma 85. Let y Ylh&H and if = Ylh&H P ^ ^-rg^i^x^lmaxlly/l, |y;|}} then 

upper(p) > rf\\y + if\\^ 


assuming (15.8). 

Proof. Recall that ||c + c ?'||2 < gap where gap = min{||c|I 2 , | jc'lI 2 }, 

c = y-'^ adi + ^ / 3 jlj + ^ jij(li - Ij) and f = y - a'l, + ^ / 3 'lj + ^ 7^(1^ - 1^) . 
* i (iJ) i j (iJ) 

By (15.8) we know that ||c - y\\.^ < dn'^gap < x^||c ||2 and He' - if\\.^ < dn^gap < x^||^|| 2 - 
Consequently, by the triangle inequality we have that 

\\y + 2/II2 < ||c + c'||2 + ||c - y\\^ + ||c' - ^||2 < 9 n^gap 
and ^ 

||d1|2 < ||?-y||2 + ||y||2 < Y^||(1l2 + I|y|l2 ^ 11^12 ^ 2||y||2 

Similarly, we have that ||c '||2 < ^||^|| 2 ' Consequently since gap < min{| |c| I 2 , He'll 2 }) we have 
that 

2 

P + ^ll2 ^ ;^™“{pil2’Pll2} 

and thus, invoking Lemma 81 yields the result. □ 

We summarize the results in the lemma below. 


Corollary 86. Let P be the feasible region after running eutting plane on (15.1). Then one of the 
following holds: 

1. We found a degenerate BPS and henee either (/} or V is a minimizer. 

2. The integral points of P all lie on some hyperplane Xi = 0, Xj = 1 or Xi = xj which we can 
find. 

3. Let H be the collection of BFS’s h used to construct our separating hyperplanes for P. Then 
there is a convex combination y of H such that n^|yi| < maxpupper(p) for all i. 

Proof. As mentioned before, (1) happens if some separating hyperplane is degenerate. We have (2) 
if one of the conditions in Lemma 84 holds. Otherwise, y = YlheH ^hh + YlheH ^ candidate 

for Case 3 by Lemma 85. □ 

Let us revisit the conditions of Lemma 79 and explain that they are satisfied by Case 3 of the 
last lemma. 


• y is a convex combination of at most 0(n) BFS’s. This holds in Case 3 since our current 
feasible region consists of only 0{n) constraints thanks to the Cutting Plane method. 

• Those BFS’s must be consistent with every arc of A. This holds because Case 3 uses the 
BFS’s for constructing our separating hyperplane. Our modified separation oracle guarantees 
that they are consistent with A. 

Thus in Case 3 of the last corollary. Lemma 79 allows us to deduce a new constraint Xp < Xq for 
some q f R{p). 
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15.3.3 Running Time 

Here we bound the total running time of our algorithm and prove the following. 

Theorem 87. Our algorithm runs in time 0(n^ logn • EO + n® log^^^^ n). 

Proof. To avoid being repetitive, we appeal to Corollary 86 . Each phase of cutting plane takes 
time 0(n^logn • EO + n^log*^^^^ n) (Theorem 82 with r being a big constant. Given F and F' 
represented as a nonnegative combination of facets, we can check for the conditions in Lemma 84 
in 0{n) time as there are only this many facets of P. This settles Case 2 of Corollary 86 . Finally, 
Lemma 79 tells us that we can find a new arc in 0{n ■ EO + n^) time for Case 3 of Corollary 86 . 
Our conclusion follows from the fact that we can get x* = 0, x* = 1, Xj = Xj at most n times and 
Xj < Xj at most 0{n^) times. □ 

15.4 0{rv‘ ■ EO + Time Algorithm 

Here we show how to improve our running time for strongly polynomial SFM to 0(re^-E0+n^). Our 
algorithm can be viewed as an extension of the algorithm we presented in the previous Section 15.3. 
The main bottleneck of our previous algorithm was the time needed to identify a new arc, which 
cost us 0{n^ ■ EO + n^). Here we show how to reduce our amortized cost for identifying a valid arc 
down to 0{n ■ EO + n^) and thereby achieve our result. 

The key observation we make to improve this running time is that our choice of p for adding 
an arc in the previous lemma can be relaxed, p actually need not be arg maxj upper(i); instead 
it is enough to have upper(p) > max{aj, /3j,/3'-, 7 ^^, yb}. For each such p a new constraint 
Xp < Xq can be identified via Lemma 79. So if there are many p’s satisfying this we will be able to 
obtain many new constraints and hence new valid arcs (p, q). 

On the other hand, the bound in Lemma 85 says that our point in the base polyhedron is small 
in absolute value. This is actually stronger than what we need in Lemma 79 which requires only 
its positive entries to be “small”. However as we saw in Lemma 80 we can generate a constraint of 
the form Xq < Xp whenever lower(p) is sufficiently negative. 

Using this idea, we divide V into different buckets according to upper(p) and lower(p). This 
will allow us to get a speedup for two reasons. 

First, bucketing allows us to disregard unimportant elements of V during certain executions 
of our cutting plane method. If both upper(i) and lower(i) are small in absolute value, then i is 
essentially negligible because for a separating hyperplane x < /(x), any hi G [lower(i), upper(z)] 
small in absolute value would not really make a difference. We can then run our cutting plane 
algorithm only on those non-negligible i’s, thereby reducing our time complexity. Of course, whether 
hi is small is something relative. This suggests that partitioning the ground set by the relative size 
of upper(z) and lower(i) is a good idea. 

Second, bucketing allows us to ensure that we can always add an arc for many edges simulta¬ 
neously. Recall that we remarked that all we want is < upper(p) for some y in the base 

polyhedron. This would be sufficient to identify a new valid arc {p,q). Now if the marginal dif¬ 
ferences upper(p) and upper(p') are close in value, knowing n^^^^\yi\ < upper(p) would effectively 
give us the same for p' for free. This suggests that elements with similar marginal differences should 
be grouped together. 

The remainder of this section simply formalizes these ideas. In Section 15.4.1 we discuss how 
we partition the ground set V. In Section 15.4.2, we present our cutting plane method on a subset 
of the coordinates. Then in Section 15.4.3 we show how we find new arcs. Finally, in Section 15.4.4 
we put all of this together to achieve our desired running time. 
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15.4.1 Partitioning Gronnd Set into Buckets 

We partition the ground set V into different buckets according to the values of upper(i) and 
lower(z). This is reminiscent to Iwata-Orlin’s algorithm [60] which considers elements with big 
upper (i). However they did not need to do bucketing by size or to consider lower (i), whereas these 
seem necessary for our algorithm. 

Let N = maxj{max{upper(z), —lower(z)}} be the largest marginal difference in absolute value. 
By Lemma (78), > 0. We partition our ground set V as follows: 

Bi = {i : upper(i) > N/n^^ or lower(z) < —N/n^^} 


Bk = {i ^ BiU ...U Bk-i : < upper(i) < 

or - < lower(i) < -N/n^°’^}, k>2 

We call Bk buckets. Our buckets group elements by the values of upper (i) and lower(z) at 
“precision”. There are two cases. 

• Case 1: the number of buckets is at most logn^, in which case upper(z) > or 

lower(i) < for all i. 


• Case 2: there is some k for which |i?i U ... U Bk\ > |i?fc_|_i|. 


This is because if there is no such k in Case 2 , then by induction each bucket B^+i has at least 
2 ^|Hi| > 2 ^ elements and hence k < logn. 

Case 1 is easier to handle, and is in fact a special case of Case 2 . We first informally sketch the 
treatment for Case 1 which should shed some light into how we deal with Case 2 . 

We run Cutting Plane for 0(n log^ n) iterations (i.e. r = 0(logn)). By Theorem 82, our feasible 
region P would be sandwiched by a pair of approximately parallel supporting hyperplanes of width 
at most Now proceeding as in the last section, we would be able to find some y in the 

base polyhedron and some element p such that < upper(p). This gives 


^e(logn)|y.| < 


upper(p) 

n®(logn) 


N 

— „0(logn) ■ 


Since upper(i) > or lower(i) < —for all i in Case 1, we can then conclude 

that some valid arc (f,g) or ((?, i) can be added for every i. Thus we add n/2 arcs simultaneously 
in one phase of the algorithm at the expense of blowing up the runtime by O(logn). This saves 
a factor of n/logn from our runtime in the last section, and the amortized cost for an arc would 
then be 0{n ■ EO + u?). 

On the other hand, in Case 2 we have a “trough” at Roughly speaking, this trough is 

useful for acting as a soft boundary between Hi U ... U and Uz>fc+2 Recall that we are able 
to “ignore” Uz>fc +2 because their hi is relatively small in absolute value. In particular, we know 
that for any p G Ri U ... U Rfc and i ^ Bi, where I > k + 2, 

max{upper(p), —lower(p)} > max{upper(i), —lower(i)}. 


This is possible because Rfc+i, which is sandwiched in between, acts like a shield preventing Bi 
to “mess with” Ri U ... U Rfc. This property comes at the expense of sacrihcing Rfc+i which must 
confront R/. 

Furthermore, we require that |Ri U ... U R^j > |Rfc+i|, and run Cutting Plane on R = (Ri U 
... U Rfc) U Rfc+i. If jRfc+ij \Bi U ... U Rfcj, our effort would mostly be wasted on R^+i which is 
sacrihced, and the amortized time complexity for Ri U ... U Rfc would then be large. 

Before discussing the algorithm for Case 2, we need some preparatory work. 

^More precisely, Bk = 9 for k > [log n]. 


96 





15.4.2 Separating Hyperplane: Project and Lift 

Our speedup is achieved by running our cutting plane method on the projection of our feasible 
region onto B := {Bi U ■ ■ ■ U B^) U -Bfc+i- More precisely, we start by running our cutting plane 
on = {x G IR^ : 3x' G R'® s.t. {x^x') satisfies (15.1)}, which has a lower dimension. However, 
to do this, we need to specify a separation oracle for . Here we make one of the most natural 
choices. 

We begin by making an apparently immaterial change to our set of arcs A. Let us take the 
transitive closure of A by adding the arc {i,j) whenever there is a path from i to j. Clearly this 
would not change our ring family as a path from i to j implies j G R{i). Roughly speaking, we do 
this to handle pathological cases such as {i, k), {k,j) G A, {i,j) ^ A and i,j € B,k ^ B. Without 
introducing the arc (f, j), we risk confusing a solution containing i but not j as feasible since we 
are restricting our attention to B and ignoring k ^ B. 

Definition 88 . Given a digraph D = (H, A), the transitive closure of A is the set of arcs (i, j) for 
which there is a directed path from i to j. We say that A is complete if it is equal to its transitive 
closure. 


Given x G [0,1]^, we define the completion of x with respect to A as follows. 

Definition 89. Given x G [0, 1]'® and a set of arcs A^ x^ G [0, !]”■ is a completion of x if = x 
and x^ < x^ for every (f,j) G A. Here x^ denotes the restriction of x^ to B. 

Lemma 90. Given x G [0,1]^ and a complete set of arcs A, there is a completion of x if Xi < Xj 
for every {i,j) £ An {B x B). Moreover, it can be computed in 0{n^) time. 

Proof. We set x^ = x. For i ^ B, we set 


1 


if $j G B s.t. (f, j) G A 
otherwise 


One may verify that x 
0(n) time. Since |R\H| = 


^ satisfies our requirement as A is complete. Computing each x? takes 
|.B| < n, computing the whole x^ takes 0{n^) time. □ 


This notion of completion is needed since our original separation oracle requires a full dimen¬ 
sional input X. Now that x G R^, we need a way of extending it to R”' while retaining the crucial 
property that h is consistent with every arc in A. 

Note that the runtime is still 0{n ■ EO -I- log^*-^^ n) as x^ can be computed in O(n^) time by 

the last lemma. 

We reckon that the hyperplane o^xb < Yhi&B returned by the oracle is not a valid separat¬ 
ing hyperplane (i.e. it may cut out the minimizers). Nevertheless, we will show that it is a decent 
“proxy” to the true separating hyperplane Jr x < /(x^) = Yliev good enough to serve 

our purpose of sandwiching the remaining feasible region in a small strip. To get a glimpse, note 
that the terms missing Ji^xb < YlisB all involve hi for i ^ B, which is “negligible” compared 
to Hi U • • • U Hfc- 

One may try to make h^xs < XlieB valid, say, by h^xs < Xlies 1^*1- The 

problem is that such hyperplanes would not be separating for x anymore as h^x = ^ 

YlieB^Pi + Consequently, we lose the width (or volume) guarantee of our cutting 

plane algorithm. Although this seems problematic, it is actually still possible to show a guarantee 
sufficient for our purpose as Yli^B 1^*1 relatively small. We leave it as a nontrivial exercise to 
interested readers. 
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Algorithm 6: Projected Separation Oracle 
Input: X G IR'® and a complete set of arcs A 
if Xi < 0 for some i G B then 
I Output: Xj > 0 
else if Xj > 1 for some j G B then 
I Output: Xj < 1 

else if Xi > Xj for some {i,j) G An B^ then 
I Output: Xi < Xj 
else 

Let x^ G R"' be a completion of x 

Let ii,... ,in be a permutation of V such that > ... > x^^ and for all {i,j) G A, j 
precedes i in ii,... ,in- 

Output: h^XB = YlisB — Yhi&B ^iXi, where h is the BPS defined by the 
permutation ii,... ,in- 


In conclusion, it seems that one cannot have the best of both worlds: the hyperplane returned 
by the oracle cannot be simultaneously valid and separating. 

Algorithm 

We take k to be the first for which |Bi U ...> \Bk^i\, i.e. |i?i U...Ui?;| < |i?z+i| for / < k — 1. 
Thus k < logre. Let b = \B\, and so |BiU- • -UBkl > 6/2. Case 1 is a special case by taking B = V. 

Our algorithm is summarized below. Here A is always complete as A is replaced its transitive 
closure whenever a new valid arc is added. 

1. Run Cutting Plane on = {x G R^ : 3x' G R^ s.t. (x,x') satishes (15.1)} with the new 
projected separation oracle. 

2. Identify a pair of “narrow” approximately parallel supporting hyperplanes. 

3. Deduce from the hyperplanes certain new constraints of the forms Xi = 0,Xj = l,Xi = Xj or 
Xi < Xj by lifting separating hyperplanes back to R"’ 

4. Consolidate A and /. If some Xj < Xj added, replace A by its transitive closure. 

5. Repeat Step 1 with updated A and /. (Any previously found separating hyperplanes are 
discarded.) 

The minimizer can be constructed by unraveling the recursion. 

First of all, to be able to run Cutting Plane on we must come up with a polyhedral 
description of P^ which consists of just the constraints involving B. This is shown in the next 
lemma. 

Lemma 91. Let P^ = {x G R'® : 3x' G R^ s.t. (x, x') satisfies (15.1)}. Then 

P^ = {x G R^ : 0 < X < 1, Xi < Xj\/{i,j) G An {B x B)} 

Proof. It is clear that P^ C {x G R'® : 0 < x < 1,x* < XjV(i, j) G An{B x B)} as the constraints 
0 < X < l,Xi < Xj\/{i,j) G An{B X B) all appear in (15.1). 

Conversely, for any x G R'® satisfying 0 < x < 1, Xi < Xj'i{i, j) G An {B x B), we know there is 
some completion x^ of x by Lemma 90 as A is complete. Now x^ satisfies (15.1) by dehnition, and 
hence x G P^ . □ 
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The only place where we have really changed the algorithm is Step (3). 

15.4.3 Deducing New Constraints Xi = 0, Xj = 1, Xi = xj or Xi < Xj 
Our method will deduce one of the following: 

• Xj = 0, Xj = 1 or Xj = Xj 

• for each p G U • • • U Bk, Xp < Xq for some q ^ R{p) or Xp > Xq for some q ^ Q{p) 

Our argument is very similar to the last section’s. Roughly speaking, it is the same argument but 
with “noise” introduced hy i ^ B. We use extensively the notations from the last section. 

Our main tool is again Theorem 82. Note that n should be replaced by b in the Theorem 
statement. We invoke it with r = A:logf,n = O(log^n) (using k < logn) to get a width of 
I/ 50 O) = This takes time at most 0(6n log^ n-EO+bv? log^^^^ n). Again, this is intuitively 

clear as we run it for 0{kblogn) iterations, each of which takes time 0(n • EO + log*^^^^ n). 

After each phase of (roughly 0{kblogn) iterations) of Cutting Plane, is sandwiched between 
a pair of approximately parallel supporting hyperplanes F and F' which have width 1 /. Let F 
and F' be 


(Fxb = ^ CjXj < M, P'^xb = ^ diXi < M', 
ies ieiJ 


such that 

|M + M'l, ||c + c '||2 < gap, where gap = min{||cl| 2 , H^lb}- 

The rest of this section presents an execution of the ideas discussed above. All of our work is 
basically geared towards bringing the amortized cost for identifying a valid arc down to 0{n ■ 
EO + n^). Again, we can write these two constraints as a nonnegative combination. Here x£ is 
the completion of the point Xh used to construct Jv^xb < (®Db' (I^^call that {x^) ^ is the 

restriction of x)( to H.) 


c^XB = - ^ ajXj+y^ PjXj+ 


ieB 


jeB 


7jj(xj-Xj)+y~] \hh%XB 


and 




h&H 


M - + 

j&B 


h&H 


Xhhl 




c'^XB = -'^ a'xjT^ /3jXj+ yij{xi-Xj)+Y K^b^b and M' = Y {xh)B ■ 

ieB jeB (i,j)eAnB^ heH jeB h&H 

As we have discussed, the problem is that the separating hyperplanes Jv^xb < {^T) b 

actually valid. We can, however, recover their valid counterpart by lifting them back to JFx < Tr x^- 
The hope is that K^xb < ^b {^h)B different so that the arguments 

will still go through. We show that this is indeed the case. 

Again, we scale c, c', a, a', /3, /3', 7 , 7 ', A, A' so that 

i^h + A(j) = 1 . 

h&H 

By adding all the constituent separating hyperplane inequalities, we get 
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Let 


Xhh^x + Y < Y ^hh^xi + Y ^hYxi 

h&H h&H h&H h&H 


LHS = Y ^iXi + Y ^iXi - Y ~ ^ij^Xj - Xi) + ^ iij{Xj - Xi). 

Here we know that 

^ ^ XjihFX + ^ ^ XyJiFX = LHS + (c + c xb + ^ ^ Xy^h^xg + ^ ^ XjYgXg 

heH h&H h&H h&H 


Y, X^l + Yj — {M + M') + Y^ ^hYg (Xft) g+Y^ ^hYs Yh) ~Y1 

heH h&H h&H h&H 

Combining all yields 

LHS+{c+^)^xb+Yj ^hh^xg+Yj ^hpB^B — {M+M')+Y^ Xhhg (xft)^+^ (^fe) r~Y. f^3~Y. 

h&H h&H h&H h&H 

Here (c + cf)'^XB can be bounded as before: (c + ^)'^xb > —\/n||c + c ?'||2 > —^/nga3p. Since 
M + M' < gap, We then obtain 

LHS+Y ^hh^gXg + Y ^h^xg < 2Vngap+ Y ^hhg (x^) g + Y (4) B-Y^j~Y ^'3 

h&H heH h&H h&H 


We should expect the contribution from hg to be small as hi for i ^ B \s small compared to 
Hi U ... U Hfc. We formalize our argument in the next two lemmas. 

Lemma 92. We have J2h&H ^hYg (4) b + T.h(iH KY i^h) b - 

Proof. We bound each component of Yhh&H ^hPg ifY) b '^heH (4) b' i £ H, we have 
upper(i) < By Lemma 78 hi < upper(i). Therefore, 


Y i^h)i + Y - f 

h&H h£H \h&H h&H / 

Our result then follows since 

Y (Xh) B+Y ^'h^B (Xh) B = Z] f Z] Yh)i + Y ^hhj (4)i) • 

h&H h&H i^B \h&H h&H J 

□ 


Lemma 93. We have Ylh&H ^hh"gXg + Ylh&H ^hYi^g — — N/^. 

Proof. The proof is almost identical to the last lemma except that we use hi > lower(i) instead of 
hi < upper(i), and lower(i) > . □ 


The two lemmas above imply that 

LHS < 2 V^gap -Y^3-Y + 2iV/nio('=+i)-' = gap' -Yf^3-Y 

where gap' = 2y'ngap + . 
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Lemma 94. Suppose x satisfies (15.1) and LHS < gap' — f^'j /®i> 7*i > ® i > ) 7ij — 

0 . 

1. If Ui > gap' or a[ > gap', then Xi < 1. 

2. If fij > gap' or /3' > gap', then xj >0. 

3. If ■jij > gap' or 77 > gap', then 0 < Xj - Xi < 1. 

Proof. The proof is exactly the same as Lemma 84 with 2y'ngap replaced by gap'. □ 

From now on we may assume that 

max{ai, a', /?', 7 *^-, yb} < gap'. (15.9) 

Lemma 95. Let y Ylh&H ^hh and if = Ylh&H P ^ ^-rgni^x^g^lmaxllyil, |7/(|} then 

N>n^<^'^+^\\yB + y'B\L 

assuming (15.9). 

Proof. Recall that ||c+c '||2 < gap < gap' where gap = min{||c|| 2 , ||c^|| 2 } and gap' = 2y'ngap + 

Now there are two cases. 

Case 1: 2y'ngap > . Then gap' < dy'ngap and we follow the same proof of 

Lemma 85. We have 


c = ijB-'^aiii + '^/Ijij + '^'yijiii-lj) and fi = ^ a'l^ + ^/3'lj + ^ 77(1* -1^). 

* i (*j) i i {id) 

By (15.9) we know that ||c - < dn^gap' < ;pW||c1|2 and ||c' - if^\\.^ < dn^gap' < ;pW||c1|2- 

Consequently, by the triangle inequality we have that 

WVB + 27s||2 < ||c + c '||2 + ||c - yB ||2 + 11^ - l/filla < 9n^gap' 
and ^ 

11^12 < ||c-y_B||2 + ||yB||2 < ;^||^|2+ ^ NI 2 ^ 2||ys||2 

Similarly, we have that ||c ^||2 < ^||27b|| 2' Consequently since gap' < min{||c|I 2 , ||c^|| 2 }) we have 
that 

18 

\\yB+M\2 ^ ffi7k^^^{\\M\2^WB\\2} 

and thus, invoking Lemma 81 yields N > upper(p) > as desired. 

Case 2: 2^/ng^Lp < Then for any i G B, |cj + c'| < ||c + c '||2 < gap < 

Since 

yB + yB = ic + fi) + Yl “ ^ 1 ) + Y1 “ Y ^ 7 ^^* “ ^ 1 ) 

i 3 (id) i j (id) 

we have 

WvB + + 2n^-5gap' < iV/n^°^+^. 

□ 
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Corollary 96. Let P he the feasible region after running Cutting Plane on (15.1) with the projected 
separation oracle. Then one of the following holds: 

1. We found a BPS h with hs = 0. 

2. The integral points of P all lie on some hyperplane Xi = 0,Xj = 1 or Xi = Xj. 

3. Let H he the collection of BBS’s h used to construct our separating hyperplanes for P. Then 
there is a convex combination y of H such that for p G BiVJ- ■ -CBk, we have n‘^\yi\ < upper(p) 
or lower(p) < —n^\yi\ for all i. 

Proof. As mentioned before, (1) happens if some separating hyperplane satisfies hs = 0 when 
running cutting plane on the non-negligible coordinates. We have (2) if some condition in Lemma 
94 holds. Otherwise, we claim y = ^ candidate for Case 3. y is a convex 

combination of BFS and by Lemma 95, for the big elements i € B we have 

\yi\ < ^ max{upper(p), —lower(p)}. 

where the last inequality holds since for p G i?i U • • • U Bfc, max{upper(p), —lower(p)} > . 

On the other hand, for the small elements i ^ B, \yi\ < ^ max{upper(p), —lower(p)} 

as desired. □ 

The gap is then smaller enough to add an arc for each p G Bi L) ■ ■ ■ L) B^ by Lemmas 79 and 
80. Therefore we can add a total of \Bi U • • • U Bk\/2 > h/\ arcs with roughly 0{kb\ogn) = 0(6) 
iterations of Cutting Plane, each of which takes 0(n • EO + n^). That is, the amortized cost for 
each arc is 0(n-E0 + n^). We give a more formal time analysis in below but it should be somewhat 
clear why we have the desired time complexity. 

Lemma 97. Suppose there is a convex combination y of H such that for p G Bi U ■ ■ ■ U Bf^, we 
have n^|yi| < upper(p) or lower(p) < —n‘^\yi\ for all i. Then we can identify at least 6/4 new valid 
arcs. 

Proof. We have \P[\ = 0{n) since H is the set of BFS’s used for the constraints of P which has 
0{n) constraints. By Lemmas 79 and 80, for p G i?i U • • • U we can add a new valid arc (p, q) 
or {q,p). However note that a new arc (pi,P 2 ) may added twice by both pi and p 2 . Therefore the 
total number of new arcs is only at least | Hi U ••• U 5^1/2 > 6/4. □ 

15.4.4 Running Time 

Not much changes to the previous runtime analysis are needed. To avoid repetition, various details 
already present in the corresponding part of the last section are omitted. Recall k < logn, and of 
course, b < n. 

For each (roughly) 0{kblogn) iterations of Cutting Plane we either get Xi = 0,Xi = l,Xi = xj 
or bfA Xi < Xj’s. The former can happen at most n times while in the latter case, the amortized 
cost of each arc is 0{klogn) iterations of Cutting Plane. In the worst case the overall number 
of iterations required is 0{n^). Thus our algorithm has a runtime of 0{n^ ■ EO + n^) since each 
iteration is 0{n ■ EO + n^) as shown below. 

Theorem 98. Our algorithm runs in time 0{n^ log^ n ■ EO + log*^^^^ n). 
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Proof. We use Corollary 96. First we note that Case 1 can actually be integrated into Case 3 since 
max{upper(p), —lower(p)} > N /> hi for i ^ B. 

As we have argued in the beginning of the last section, Theorem 82 with t = k log^ n implies 
that the runtime for each phase is 0{bnlog^ n • EO + 6n^ log*^^^^ n). In each phase we either get 
Xi = 0, Xi = 1, Xi = Xj (Case 2) or 6/4 Xi < Xj’s (Case 3), the latter of which follows from Corollary 
96 and Lemma 97. 

Case 2 can only happen n times. Thus the total cost is at most 0{v? log^ n • EO + log*^^^^ n). 

The overhead cost is also small. Similar to before, given F and F' represented as a nonnegative 
combination of facets, we can check for the conditions in Lemma 94 in 0{n) time as there are only 
this many facets of P. This settles Case 2. 

For case 3 the amortized cost for each arc is 0{n log^ n ■ EO + log*^^^^ n). Our desired runtime 

follows since there are only 0{'n?) arcs to add. Unlike Case 2 some extra care is needed to handle 
the overhead cost. The time needed to deduce a new arc (applying Lemmas 79 and 80 to y and 
p G U • • • U Bjf) is still 0{n ■ EO + n^). But as soon as we get a new arc, we must update A to 
be its transitive closure so that it is still complete. Given A complete and a new arc {p, q) ^ A, we 
can simply add the arcs from the ancestors of p to g and from p to the descendants of q. There are 
at most 0{n) arcs to add so this takes time O(n^) per arc, which is okay. □ 

16 Discussion and Comparison with Previous Algorithms 

We compare and contrast our algorithms with the previous ones. We focus primarily on strongly 
polynomial time algorithms. 

Convex combination of BFS’s 

All of the previous algorithms maintain a convex combination of BFS’s and iteratively improve 
over it to get a better primal solution. In particular, the new BFS’s used are typically obtained by 
making local changes to existing ones. Our algorithms, on the other hand, considers the geometry 
of the existing BFS’s. The weighted “influences”® then aggregately govern the choice of the next 
BFS. We believe that this is the main driving force for the speedup of our algorithms. 

Scaling schemes 

Many algorithms for combinatorial problems are explicitly or implicitly scaling a potential 
function or a parameter. In this paper, our algorithms in some sense aim to minimize the volume 
of the feasible region. Scaling schemes for different potential functions and parameters were also 
designed in previous works [56, 54, 60, 53]. All of these functions and parameters have an explict 
form. On the contrary, our potential function is somewhat unusual in the sense that it has no 
closed form. 

Deducing new constraints 

As mentioned in the main text, our algorithms share the same skeleton and tools for deducing 
new constraints with [56, 54, 60, 53]. Nevertheless, there are differences in the way these tools 
are employed. Our algorithms proceed by invoking them in a geometric manner, whereas previous 
algorithms were mostly combinatorial. 

Big elements and bucketing 

Our bucketing idea has roots in Iwata-Orlin’s algorithm [60] but is much more sophisticated. For 
instance, it is sufficient for their algorithm to consider only big elements, i.e. upper(i) > N/'nP^^\ 
Our algorithm, on the other hand, must carefully group elements by the size of both upper(i) and 

®In the terminology of Part I, these weighted influences are the leverage scores. 
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lower(i). The speedup appears impossible without these new ideas. We do however note that it 
is unfair to expect such a sophisticated scheme in Iwata-Orlin’s algorithm as it would not lead to 
a speedup. In other words, their method is fully sufficient for their purposes, and the simplicity in 
their case is a virtue rather than a shortcoming. 

16.1 Open Problems 

One natural open problem is improving our weakly polynomial algorithm to 0(n^ log M • EO + 
Tl? log*^^^^ n • log M) time. Our application of center of mass to SFM demonstrates that it should 
be possible. 

For strongly polynomial algorithms, the existential result of Theorem 71 shows that SFM can 
be solved with 0(n^ log n • EO) oracle calls. Unfortunately, our algorithm incurs an overhead of 
log n as there can be as many as log n buckets each time. One may try to remove this log n overhead 
by designing a better bucketing scheme or arguing that more arcs can be added. 

The other logn overhead seem much trickier to remove. Our method currently makes crucial 
use of the tools developed by [56], where the logn factors in the runtime seem inevitable. We 
suspect that our algorithm may have an analogue similar to [93, 90], which do not carry any logn 
overhead in the running time. 

Perhaps an even more interesting open problem is whether our algorithm is optimal (up to 
polylogarithmic factors). There are grounds for optimism. So far the best way of certifying the 
optimality of a given solution 5 C U is to employ duality and express some optimal solution to the 
base polyhedron as a convex combination of n + 1 BFS’s. This already takes n^ oracle calls as each 
BFS requires n. Thus one would expect the optimal number of oracle calls needed for SFM to be 
at least n^. Our bound is not too far off from it, and anything strictly between and seems 
instinctively unnatural. 
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